Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

AllGenes

http://www.allgenes.org

Babenko, V., Brunk, B., Crabtree, J., Diskin, S., Li, L., Mazzarelli, J, McWeeney, S., Pinney, D., Pizarro, A., Schug, J., Stoeckert, C

Computational Biology and Informatics Laboratory, Center for Bioinformatics, University of Pennsylvania.

Contact   brunkb@pcbi.upenn.edu


Database Description

AllGenes.org is a human and mouse view of the GUS (Genomics Unified Schema) relational database and includes a gene index generated by assembly of publicly available EST and mRNA sequences integrated with public genomic sequence (UCSC Golden Path assemblies). Automated annotation has been applied to characterize these sequences and relate them along with their predicted protein sequences to conceptual genes. As of September 1, 2001, the gene index contains 175,153 human and 76,746 mouse non-singleton assemblies that cluster to 140,369 human and 74,050 mouse putative genes. 47% of the mouse genes have similarity to a known protein sequence and 17% have been assigned a GO (Gene Ontology Consortium) function. The GUS schema is organized around the central dogma of biology (genes are transcribed to RNA which are translated to proteins) enabling a powerful query interface which allows users to identify data sets for browsing and further analysis. As an example query that returns 17 RNA entries, one could ask for all mouse RNAs located on chromosome 7 that are expressed in the brain whose products are predicted to be transcription factors. The source and ownership of all data, algorithms run on it and evidence for assertions such as GO function predictions are stored in GUS allowing users to assess the validity of the data.

Recent Developments

Gene Ontology function predictions are now based on both Prodom2001 and CDD similarities resulting in greater accuracy and coverage. Protein translations using FrameFinder (http://www.hgmp.mrc.ac.uk/~gslater/estateman/framefinder.html) have been generated for all assemblies. RH mapping information for mouse has been integrated in addition to human. Sequences identified via a query can be downloaded and a download site is now available to retrieve the entire set of assemblies as well as other data and information such as the set of GO function predictions.

Acknowledgements

This work was supported by grants the National Institutes of Health (R01HG01539) and the Department of Energy (DE-FG02- DOE00ER62893).

Category   Gene Identification and Structure

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers