![]()
Abel Ureta-Vidal
EnsEMBL project Room A2 06, EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
abel@ebi.ac.uk
The value of knowing - either at a base pair, gene or genomic segment level - which sequences are homologous within or between species, enhances many aspects of understanding for either individual genomes or sets of genomes. The comparative genomics/proteomics aspect of Ensembl, known in house as "compara", aims to enhance the utility of each separate genome as well as to provide data for interesting evolutionary analysis. This is especially true of comparisons with the human genome, as scientists are keen to know, for example, which gene in a model organism, such as mouse, might prove to be a useful gene model for a potential disease-causing gene in human. Comparative genomics has also been shown to be very useful in providing insight into gene function and structure, and in predicting possible cis regulatory regions conserved between orthologous genes in different species. This is based on the premise that overtime functional regions will be conserved between species whilst non-functional regions will not. The inter-species comparison of complete proteomes, whether on a gene by gene basis or gene family groups, provides information on possible protein functions, history of gene duplication and identifying negative or positive selection.
At the time of writing, Ensembl (http://www.ensembl.org) displays and distributes genome annotations for nine metazoan species: Homo sapiens, Mus musculus, Rattus norvegicus, Fugu rubripes, Danio rerio, Drosophila melanogaster, Anopheles gambiae, Caenorhabditis elegans, Caenorhabditis briggsae. More genomes are expected to come soon in 2004, such as Chimpanzee, Chicken and Honeybee. We generate data at both the DNA and protein/gene levels. At the DNA level whole genome alignments are performed, from which regions of synteny are derived. At the protein level putative orthologues and protein families are identified. This comparative genomics is made available in a user friendly manner through the Ensembl web pages. The comparative genomics links enable the user to navigate easily between species, for example, from one gene in human to its orthologue in mouse, or between regions of synteny in mouse and rat. Flat datafile e.g. get a list of human/mouse orthologues can be obtained in different format (txt, html, excel) through the Ensembl datamining system EnsMart. Like all of Ensembl, the entire dataset is available as is the code to access and manipulate it.
![]()