Practical lesson 1

PATTERNS AND PROFILES.
PROTEIN MOTIFS AND DOMAINS.


By F. Abascal,  PDG, CNB, CSIC.
 
In this practical lesson we will go through several examples to illustrate the concepts of  "PATTERNS", "PROFILES", "MOTIFS", "DOMAINS" and "FAMILIES". Each of the exercises will be centered in the analysis of a specific sequence, as it is described later.
The examples include results from database searches or from the application of several tools.
HOWEVER, the idea is that you should repeat by yourself the various analyses that are mentioned.



 
A selection of links to Databases and Tools:
 
General Tools: Pattern Tools and Databases:
Profile Tools and Databases:
 



1. PATTERNS

RPE_YEAST, from Saccharomyces cerevisiae, annotated as: "Ribulose-phosphate 3-epimerase".
>my_protein
MVKPIIAPSI LASDFANLGC ECHKVINAGA DWLHIDVMDG HFVPNITLGQ PIVTSLRRSV
PRPGDASNTE KKPTAFFDCH MMVENPEKWV DDFAKCGADQ FTFHYEATQD PLHLVKLIKS
KGIKAACAIK PGTSVDVLFE LAPHLDMALV MTVEPGFGGQ KFMEDMMPKV ETLRAKFPHL
NIQVDGGLGK ETIPKAAKAG ANVIVAGTSV FTAADPHDVI SFMKEEVSKE LRSRDLLD



2. PROFILES

Y1333_MYCTU, from Mycobacterium tuberculosis, "Hypothetical protein Rv1333".
    • How many domains does this protein have?
    • Where are they?
    • What are their functions? (click on each of them to access their individual entries)
    • What can we say about the function of the protein?
    • How many proteins do have the domain peptidase_S58?

    • Which other domains appear associated to the domain peptidase_S58? (Go to the Domain Organization box and click on "View Graphic")

3. FAMILIES

Protein coded by the gene gcsf of Bos taurus (Granulocyte colony-stimulating factor precursor)
>sw|P35833|CSF3_BOVIN Granulocyte colony-stimulating factor precursor (G-CSF).
MKLMVLQLLLWHSALWTVHEATPLGPARSLPQSFLLKCLEQVRKIQADGAELQERLCAAH
KLCHPEELMLLRHSLGIPQAPLSSCSSQSLQLTSCLNQLHGGLFLYQGLLQALAGISPEL
APTLDTLQLDVTDFATNIWLQMEDLGAAPAVQPTQGAMPTFTSAFQRRAGGVLVASQLHR
FLELAYRGLRYLAEP

4. DOMAINS

CASP9_HUMAN, from Homo Sapiens; precursor of caspase-9.
>CASP9_HUMAN
MDEADRRLLR RCRLRLVEEL QVDQLWDALL SRELFRPHMI EDIQRAGSGS RRDQARQLII
DLETRGSQAL PLFISCLEDT GQDMLASFLR TNRQAAKLSK PTLENLTPVV LRPEIRKPEV
LRPETPRPVD IGSGGFGDVG ALESLRGNAD LAYILSMEPC GHCLIINNVN FCRESGLRTR
TGSNIDCEKL RRRFSSLHFM VEVKGDLTAK KMVLALLELA QQDHGALDCC VVVILSHGCQ
ASHLQFPGAV YGTDGCPVSV EKIVNIFNGT SCPSLGGKPK LFFIQACGGE QKDHGFEVAS
TSPEDESPGS NPEPDATPFQ EGLRTFDQLD AISSLPTPSD IFVSYSTFPG FVSWRDPKSG
SWYVETLDDI FEQWAHSEDL QSLLLRVANA VSVKGIYKQM PGCFNFLRKK LFFKTS

August 2005
Michael Tress
Grupo de Diseño de Proteínas
Centro Nacional de Biotecnología, CSIC