Practical lesson 1
PREDICTING PROTEIN FUNCTION, USING APPROACHES
NOT BASED ON THE DETECTION OF SEQUENCE SIMILARITY.
Gene context, gene fusions and phylogenetic profiles.
By F. Abascal, PDG, CNB,
CSIC.
-
Given this protein (yebC),
>YEBC_ECOLI|P24237|Protein yebC.
MAGHSKWANT RHRKAAQDAK RGKIFTKIIR ELVTAAKLGG GDPDANPRLR AAVDKALSNN
MTRDTLNRAI ARGVGGDDDA NMETIIYEGY GPGGTAIMIE CLSDNRNRTV AEVRHAFSKC
GGNLGTDGSV AYLFSKKGVI SFEKGDEDTI MEAALEAGAE DVVTYDDGAI DVYTAWEEMG
KVRDALEAAG LKADSAEVSM IPSTKADMDA ETAPKLMRLI DMLEDCDDVQ EVYHNGEISD
EVAATL
-
Let's try to obtain information about its function using BLAST,
Pfam
or other tools to find
homology
relationships.
Can you find some clues about its function?
Now, let's try to use comparative
genomics information to find some clues.
-
Look at the species distribution in Pfam.
Is this protein present in a wide or a narrow phylogenetic range?
-
The COG database provides information about the gene neighborhood of the
genes belonging to each COG. So, acces the COGs
database and find the corresponding COG of yebC.
-Search by gene name ("yebC").
-Once in the yebC's COG, look at the species distribution: is yebC
always present in the bacterial genomes? What does that indicate?
-Check the "genome context" of the yebC gene. Is there some
gene that appears frequently in yebC's neighborhood? What do you think
it means? Does it gives us some clue about yebC's function?
-
We can see that knowledge about protein families, even if they are composed
of proteins of unknown function, is essential for comparative genomics.
Many databases are devoted to comparative genomics. Two of them that are
very often used, and that may provide information that is similar to that
contained in COGs are:
-
MGDB, Microbial genome database
-
KEGG. Kyoto encyclopedia
of Genes and Genomes.
-
Use now the STRING
system (Search Tool for the Retrieval of Interacting
Genes/Proteins):
it looks for conservation of gene order, common phylogenetic patterns and
gene fusions.
-Find the corresponding STRING entry (by text/by sequence).
-What associations are predicted by gene neighborhood?
-What associations are predicted by gene fussion?
-What associations are predicted based on common phylogenetic patterns
(Phylogeny)?
-Take a look to the "Summary Network". It is possible then to identify
which COGs are functionally related with those represented in the original
graph, by changing the DEPTH of the network. By doing this we are in fact
travelling along a giant network of functional interactions. Some
regions in this giant network may may correspond to specific regulatory
circuits or metabolic pathways.
March 2004