Of the 1097 CDS sequences, 661 were from regions selected manually and 436 from "randomly" selected
regions. 30 of the 44 regions chosen to be included in the ENCODE experiment were selected randomly
from groups pre-classified by non-exonic conservation and gene density.
"Randomly" chosen regions contained less CDS than those chosen for their biological interest, though
no inference can be made from this comparison, due to the nature of the "random" selection process.
Regions ENr112 (chromosome 2), ENr311 (chromosome 14), ENr313 (chromosome 16) do not have
any CDS sequences, while regions ENr113, ENr114, ENr211, ENr213, and ENr312 have just one sequence
plus alternative splice varaints.
In contrast none of the manually chosen regions has less than 2 sequences plus splice isoforms. Region ENm012 has the fewest (just 4 sequences), while ENm006 has 118 sequences.
Protein Binding is also the most frequent term in the randomly chosen sequences, it turns up 82
times. Second is "nucleus" which turns up 71 times. Membrane-associated is the third most frequent
term (59), "regulation of transcription, DNA-dependent" is fourth with 44 mentions and "membrane"
fifth with 39. "Nucleotide-binding" appears 32 times and three terms each have 25 mentions,
"ATP-binding", "Ca-binding" and "transporter activity".
In comparison with the manually chosen sequences there are proportionally more
"nucleus" sequences and less membrane related sequences. The terms "signal transduction" (14 mentions), "receptor activity" (9 mentions) and "metal ion binding" (19 mentions) are much less frequent in
the random sample.
One other thing that is noticeable is that the sequences from the manually selected regions have
considerably more GO terms per sequence. Mean GO terms per sequence for the manually chosen
regions is 5.51, while it is just 4.63 for the sequences from the randomly chosen regions, and
11.7% of the randomly chosen sequences did not have GO terms at all, compared to just 6.5%
of the sequences from the manually chosen regions.