| Course | Protein Function Prediction |
| Practical session/Exercises |
Gene context, gene fusions and phylogenetic profiles. By F. Abascal, PDG, CNB, CSIC.
The main goal of the practical session is to make course participants familiar with existing text mining and IR tools for biology literature. Practical 2: Gene Ontology, describing and annotating gene products. Gene Ontology (GO) aims to provide standardized concepts or terms to describe relevant biological aspects. Try to use GO retrieve for a set of terms: apoptosis, caspase, glycogenin, transcription factor (or in case you are interested in some particular function/process/compartment use your own query instead). What did you retrieve? Browse through the results and visualize the corresponding ontology graphs. What kind of relationships between terms did you find? What are the advantages of using this method? Try to explore annotation for a set of proteins, namely: 1) CASP9_HUMAN (P55211) (formerly known as ICE9_HUMAN), 2) Y1333_MYCTU (P64811) formerly known as YD33_MYCTU 3) RPE_YEAST (P46969) by Searching the Gene Ontology Annotation database GOA. Those proteins were used in the earlier practical. What are one of the weak points when using GO annotations for bioinformatics annotations? (Hint: think about domains). iHOP. This tool was developed at out group (PDG) at the CNB. Create a gene model for your query gene, check the results carefully, and surf through the virtual gene network of iHOP. What kind of results are obtained by iHOP? What are the advantages/disadvatages when using iHOP instead of the PubMed retireval search? Practical 3: Deriving protein interactions through literature mining Much of the function of many proteins comes interactions with other bio-molecules. Use different text mining tools which try to extract protein interactions for a given query protein/s (caspase, glycogenin, p53 etc...) from texts: iHOP, Chilibot Compare your results with entries in interaction databases: BIND, DIP , GRID , HPRD, IntAct, MINT and STRING. What kind of output is produced by each tool? Which differences do you encounter? What are the difficulties encountered by those tools? Practical 4: Deriving functional information from structure Since structural genomics programs are generating an ever-increasing number of hypothetical structures, the result is that more and more servers that predict function for structures with unknown function being developed. Function here isnt just limited to a protein's broad functional category, but can include cellular location, prediction of binding sites and prediction of interacting surface patches. Compare the predictions from the following servers for known structure 1t70. Do the the predictions help you to guess the function? |