GENCODE Introductory Pages   ENCODE   Biosapiens
ENCODE
Results
  Biosapiens

The ENCODE Project

ENCODE (the Encyclopedia Of DNA Elements) was launched in September 2003 by the National Human Genome Research Institute. The goal was to identify all functional elements in the human genome sequence.

The pilot phase currently underway aims to analyse defined regions of the human genome sequence using existing testing methods and close interactions between computational and experimental scientists.

Regions representing approximately 1 percent (30 Mb) of the human genome have been chosen and are currently being analysed by ENCODE consortium researchers. Fourteen regions were chosen because they were regions of special interest and 30 more regions were chosen randomly from clusters regions that were grouped according to non-exonic conservation and gene density.

The GENCODE Project

GENCODE is a sub-project of ENCODE, which seeks to identify all protein-coding genes in the ENCODE selected regions. For each protein coding gene this means the delineation of a complete mRNA sequence for at least one splice isoform, and often for a number of additional alternative splice forms.

Coding sequences for the 44 regions in the study have been ascertained by the Havana group. In total there are 1097 CDS sequences from the 44 selected regions of the human chromosome.

These Pages

The idea is that the results on the pages in the right bar will be updated and added to as the project develops. The pages are the collected work of a number of groups in the Biosapiens Network, with the collaboration of the Havana Group in Cambridge.

The pages are coordinated at the Protein Design Group in Madrid and there are links from these pages to the work of the individual groups. Biosapiens groups are also providing data for the ENCODE DAS project.

Initial Results

Of the 1097 sequences 663 are alternative splice variants. In addition 111 of the sequences have the same length and are 100% identical to another sequence.

The division of the sequences shows that 661 come from the 10 regions that were selected manually and just 436 come from the 34 regions that were selected at random. Although there are less manually selected regions, the total selected regions is teh same for the manual and "randomly" selected regions.

BLAST finds human homologues for all but 22 of the CDS sequences. 1003 sequences that have GO terms easily associated to them.

13 sequences have their whole sequence covered by PDB structure and 587 sequences find at least one template structure with BLAST.

Of the 994 sequences with PFAM domains, 42.5% (423 sequences) have at least one PFAM domain that is broken in two, either by insertions or deletions.

 
The Results from Biosapiens Groups
Group Data Pages

- Biocomputing Roma Structural Models

- Bioinformatics Unit, UCL, Fold Predictions

- Bologna Group - 1D Features

- CBS ENCODE Predictions

- EBI Structure and Function Annotations

- EBI UniProtKB Annotations

- Institute of Enzymology ENCODE Predictions

- PDG Summary Pages


Collected Data Pages

- Central Results Pages

 

GENCODE Data Set
GENCODE Data Set at IMIM

- GENCODE Annotations and Sequence Sets

 

Other Pages

- The GENCODE Project

- The Havana Group

- The ENCODE Project

- Target Region Selection Process

- The ENCODE Project at UCSC

   

Michael Tress
Protein Design Group, Centro Nacional de Biotecnología (CNB-CSIC)
Calle Darwin, Cantoblanco, 28049 MADRID.
Tel: (+34) 91 585 4676   Fax: (+34) 91 585 4506

Thanks to Petr Sobola for the skater.

PDG
      CNB
      CSIC