Practical lesson 1

PATTERNS AND PROFILES.
PROTEIN MOTIFS, DOMAINS AND FAMILIES


By F. Abascal,  PDG, CNB, CSIC.
 
In this practical lesson we will go through several examples to illustrate the concepts of  "PATTERNS", "PROFILES", "MOTIFS", "DOMAINS" and "FAMILIES". Each of the exercises will be centered in the analysis of a specific sequence, as it is described later.
The examples include results from database searches or from the application of several tools.
HOWEVER, the idea is that you should repeat by yourself the various analyses that are mentioned.



 
A selection of links to Databases and Tools:
 
General Tools: Pattern Tools and Databases:
Profile Tools and Databases:
 



1. PATTERNS

RPE_YEAST, from Saccharomyces cerevisiae, annotated as: "Ribulose-phosphate 3-epimerase".
>my_protein
MVKPIIAPSI LASDFANLGC ECHKVINAGA DWLHIDVMDG HFVPNITLGQ PIVTSLRRSV
PRPGDASNTE KKPTAFFDCH MMVENPEKWV DDFAKCGADQ FTFHYEATQD PLHLVKLIKS
KGIKAACAIK PGTSVDVLFE LAPHLDMALV MTVEPGFGGQ KFMEDMMPKV ETLRAKFPHL
NIQVDGGLGK ETIPKAAKAG ANVIVAGTSV FTAADPHDVI SFMKEEVSKE LRSRDLLD



2. PROFILES

YD33_MYCTU, from Mycobacterium tuberculosis, "Hypothetical protein Rv1333".

3. FAMILIES

Protein coded by the gene gcsf of Bos taurus (Granulocyte colony-stimulating factor precursor)
>sw|P35833|CSF3_BOVIN Granulocyte colony-stimulating factor precursor (G-CSF).
MKLMVLQLLLWHSALWTVHEATPLGPARSLPQSFLLKCLEQVRKIQADGAELQERLCAAH
KLCHPEELMLLRHSLGIPQAPLSSCSSQSLQLTSCLNQLHGGLFLYQGLLQALAGISPEL
APTLDTLQLDVTDFATNIWLQMEDLGAAPAVQPTQGAMPTFTSAFQRRAGGVLVASQLHR
FLELAYRGLRYLAEP

4. DOMAINS

ICE9_HUMAN, from Homo Sapiens; precursor of caspase-9.
>ICE9_HUMAN
MDEADRRLLR RCRLRLVEEL QVDQLWDALL SRELFRPHMI EDIQRAGSGS RRDQARQLII
DLETRGSQAL PLFISCLEDT GQDMLASFLR TNRQAAKLSK PTLENLTPVV LRPEIRKPEV
LRPETPRPVD IGSGGFGDVG ALESLRGNAD LAYILSMEPC GHCLIINNVN FCRESGLRTR
TGSNIDCEKL RRRFSSLHFM VEVKGDLTAK KMVLALLELA QQDHGALDCC VVVILSHGCQ
ASHLQFPGAV YGTDGCPVSV EKIVNIFNGT SCPSLGGKPK LFFIQACGGE QKDHGFEVAS
TSPEDESPGS NPEPDATPFQ EGLRTFDQLD AISSLPTPSD IFVSYSTFPG FVSWRDPKSG
SWYVETLDDI FEQWAHSEDL QSLLLRVANA VSVKGIYKQM PGCFNFLRKK LFFKTS

February 2004
Manuel J. Gómez
Grupo de Diseño de Proteínas
Centro Nacional de Biotecnología, CSIC