Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

Clusters of Orthologous Groups (COG)

http://www.ncbi.nlm.nih.gov/COG

Tatusov, R.L., Natale, D.A., Fedorova, N.D., Jackson, J., Jacobs, A., Krylov, D.M., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Wolf, Y.I., Aravind, L., Lanczycki, C., Masumder, R., Sreekumar, K., Vasudevan, S., Walker, D.R., Tatusova, T.A., Yao, K., Yin, J., Koonin, E.V.

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA

Contact   koonin@ncbi.nlm.nih.gov


Database Description

The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on phylogenetic classification of the proteins encoded in complete genomes. Each COGs includes proteins that are inferred to be orthologs (direct evolutionary counterparts). The current release consists of 3166 COGs, which include 75725 proteins from 33 bacterial genomes, 9 archaeal genomes and two genomes of unicellular eukaryotes, the yeasts Saccharomyces cerevisiae and Candida albicans (http://www.ncbi.nlm.nih.gov/COG). The COG database is updated periodically as new genomes become available. The COGs can be applied to the task of functional annotation of newly sequenced genomes by using the COGNITOR program, which is available on the COG front page.

Recent Developments

A view of the genomic context, i. e. the neighboring genes, is now available for each COG. Genomic context analysis can provide additional hints during genome annotation. A preliminary version of the COGs for the six (nearly) complete eukaryotic genomes, the fungi S. cerevisiae and Schizosaccharomyces pombe, the green plant Arabidopsis thaliana, the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster, and Homo sapiens, is currently available. The current set consists of 4386 eukaryotic COGs, which include 44405 proteins. The process of manual validation and annotation of the eukaryotic COG set is still underway. An evolutionary classification of protein domains, provisionally called DOME (domain evolution), is being developed in conjunction with the COG database; a preliminary version of DOME is available. The COG database is now used as the basis for constructing Reference Sequences for bacterial and archaeal genomes.

Category   Comparative Genomics

Go to the abstract in the NAR 2001 Database Issue.

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers