brief description of:
Buchnerabbp a·n·n·o·t·a·t·i·o·n methods
- Gene annotation
- coding genes and pseudogenes: The identification of open reading frames started with the use of orfind [1]. These predictions were refined by both using the programs genmark [2] and glimmer [3] and by hand-curation.
Pseudogenes were identified from this results and by looking for similarities to real coding sequences using blast [4].
- tRNA identification: it was carried using the program tRNAscan-SE [5].
- rRNA and other RNAs: they were identified doing blast-searches with the intergenic regions versus prokaryotic DNA from GenBank and their limits were hand-curated.
- Functional annotations
- Functional classification: the guidelines proposed by Shigenobu et al. (2000) [6] where applied for better comparison, which are based on the functional classification of the Riley's schema [7].
- Function annotation (functional descriptions, E.C. numbers, keywords): For deriving a functional description for each protein, the FUNCut program [8] was used. This program do (recursive or simple) BLAST searches to collect similar sequences, then aligns all versus all to obtain a measure of the distance between all possible pairs. This way, a representation of the sequence space is obtained, and a clustering algorithm [9] tries to detect differentiated groups of sequences in that space, that ideally will belong to a common subfamily, in which function is expected to be conserved. The 'subfamily' of the problem sequence is retrieved and its sequences annotations inspected to distill: 1) a representative function description; 2) enzymatic activities (EC numbers); and 3) keywords.
This automatic annotations were also refined by hand. The hand-curated and automatic annotations are provided by the server.- COGs [10] assignment: For the assignment of COGs we identified the orthology relationships between the two buchneras (the APS one is annotated at COGs), and transfer the annotation from Buchenera APS to buchnera BBP. The coverage of the alignments was inspected to take into account when a BBP gene was fragmented in two genes in APS. For the cases in which there was no ortholog in APS, the program COGnitor [11] was used for the assignment.
- Information storage and WEB access
The information derived from these methods and some other such as BLAST results were stored in a relational database (ORFandDB [12]). A web interface was build to access this information [13].
- References
·1: NCBI: Tatusov T, Tatusov R.
http://www.ncbi.nlm.nih.gov/gorf/gorf.html·2: Lukashin AV, Borodovsky M.
GeneMark.hmm: new solutions for gene finding.
Nucleic Acids Res. 1998 Feb 15;26(4):1107-15.·3: Suzek BE, Ermolaeva MD, Schreiber M, Salzberg SL.
A probabilistic method for identifying start codons in bacterial genomes.
Bioinformatics. 2001 Dec;17(12):1123-30.·4: Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ.
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402.·5: Lowe TM, Eddy SR.
tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.
Nucleic Acids Res. 1997 Mar 1;25(5):955-64.·6: Shigenobu S, Watanabe H, Hattori M, Sakaki Y, Ishikawa H.
Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS.
Nature. 2000 Sep 7;407(6800):81-6.·7: Riley M.
Functions of the gene products of Escherichia coli.
Microbiol Rev. 1993 Dec;57(4):862-952.·9: Abascal F, Valencia A.
Clustering of proximal sequence space for the identification of protein families.
Bioinformatics. 2002. In press.·10: Tatusov RL, Koonin EV, Lipman DJ.
A genomic perspective on protein families.
Science. 1997 Oct 24;278(5338):631-7.·11: NCBI.
http://www.ncbi.nlm.nih.gov/COG/xognitor.html·13: PDG.
http://www.pdg.cnb.uam.es/fabascal/Buch_ORFand_www
Please, contact the web site coordinator if you have any question or suggestion.