Go to the abstract in the NAR 2002 Database Issue.
Krause, A., Haas, S.A., Coward, E., Vingron, M.
MPI for Molecular Genetics Computational Molecular Biology Ihnestr. 73 14195 Berlin Germany
Contact krause_a@molgen.mpg.de
We have integrated the protein families from SYSTERS and the EST clusters from our database GeneNest with SpliceNest, a new database mapping EST contigs into genomic DNA. The SYSTERS protein sequence cluster set provides an automatically generated classification of all sequences of the SWISS-PROT, TrEMBL, and PIR databases into disjoint protein family and superfamily clusters annotated with sequence information from various other resources. For each cluster an MView (database search or multiple alignment viewer) output is generated and from the resulting partial multiple alignment a majority consensus sequence is calculated. All consensus sequences together build a searchable sequence database. The sequences in every cluster have been multiply aligned and annotated with known domains from the Pfam protein family database of alignments and HMMs. GeneNest is a database and software package for producing and visualizing gene indices from ESTs and mRNAs. Currently, the database comprises gene indices of human (based on UniGene), mouse, A.thaliana, and zebrafish. All sequences are preprocessed to detect, annotate and clip regions containing vector sequence, repeats or are of low quality. The subsequent assembly step is done with the Staden package. For all contigs of a cluster, consensus sequences are generated and extracted to build a searchable sequence database. The visualization of a contig provides further information about the sequences, the represented gene and open reading frames, and links to precomputed protein homologies detected in the SYSTERS database. SpliceNest is a web based graphical tool to explore gene structure based on a mapping of the EST consensus sequences from GeneNest to the complete human genome. Assuming that a cluster normally represents a single gene, every contig of a cluster is aligned separately to the corresponding genomic region, using a spliced alignment program. The alignments are visualized in a diagram showing the exon/intron structure of all the exons simultaneously, mapped on the common genomic sequence, automatically highlighting candidates of alternative splicing. The integration of SYSTERS, GeneNest and SpliceNest into one framework permits an over-all exploration of the whole sequence space covering protein, mRNA and EST sequences, as well as genomic DNA.
SYSTERS, GeneNest, and SpliceNest are now fully linked together, thus giving a suitable platform for the exploration of the whole sequence space.
Category Protein Sequence Motifs
Go to the abstract in the NAR 2002 Database Issue.