This is the summary of what has been done on the set (ENCODE CDS data set of
1097 transcripts from 434 loci) using tools from CBS. Of these, variable
transcripts occur in 258 loci, accounting for 921 of the proteins, according
to our analysis. Many of CBS's protein annotation servers have been run on
all the peptides including SignalP, TargetP, TMHMM, NetNGlyc, NetOGlyc,
ProP, NetAcet, NetNES, NetPhos.
A short summary of the analysis can be found
at the CBS
along with a file with all annotation
results and figures for easy viewing of the annotations for each transcript,
which shows variation in CBS predictions within transcripts from the same
locus.
Looking specifically at signal peptides and propeptides, a total of 112 of all the 434 genes have predicted signal peptides. 31 of the 258 variably transcribed genes have transcripts both with, and without a signal peptide. This appears to be an example of transcriptional variation with fundamental difference in the gene products. The same is true for the 47 loci that display variation in propeptide prediction. Transcripts that have variation in both signal peptides and propeptides are found on 6 occasions.<7p>
It should be noted though that these are
raw predictions on the protein sequence and are served without any validation of the CDS. We plan to do more work on the trends in the PTM changes.
This work was carried out by Pall Olason of Soren Brunak's group at the CBS.