ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
Hosted by NCSC USMirror sites:Canada China Korea Switzerland Taiwan
Search for

Swiss-Prot Protein Knowledgebase
What's new?

Release 41.3 of 04-Apr-2003

Release 41.3 of 04-Apr-2003

Changes concerning keywords (KW line)
New keywords:
Wnt signaling pathway

Release 41.1 of 25-Mar-2003

New syntax of the CC line topic 'ALTERNATIVE PRODUCTS'
In Swiss-Prot release 41.1 (and in the accompanying TrEMBL release), a new format was introduced for "CC ALTERNATIVE PRODUCTS" lines. The new format is more structured than the previous format. Associated with these changes are the introduction of stable identifiers for each named splice isoform in all entries that describe more than one splice isoform; the extension of feature identifiers, previously only used for human VARIANT and certain CARBOHYD features, to VARSPLIC features in entries from all species.

The new format of the CC line topic ALTERNATIVE PRODUCTS is:

CC   -!- ALTERNATIVE PRODUCTS:
CC       Event=Alternative promoter;
CC         Comment=Free text;
CC       Event=Alternative splicing; Named isoforms=n;
CC         Comment=Optional free text;
CC       Name=Isoform_1; Synonyms=Synonym_1[, Synonym_n];
CC         IsoId=Isoform_identifier_1[, Isoform_identifer_n]; 
CC         Sequence=Displayed;
CC         Note=Free text;
CC       Name=Isoform_n; Synonyms=Synonym_1[, Synonym_n];
CC         IsoId=Isoform_identifier_1[, Isoform_identifer_n]; 
CC         Sequence=VSP_identifier_1 [, VSP_identifier_n];
CC         Note=Free text;
CC       Event=Alternative initiation;
CC         Comment=Free text;
The qualifiers are described in the table below:

Topic Description
Event Biological process that results in the production of the alternative forms (Alternative promoter, Alternative splicing, Alternative initiation).
Format: Event=controlled vocabulary;
Example: Event=Alternative splicing;
Named isoforms Number of isoforms listed in the topics 'Name' currently only for 'Event=Alternative splicing'.
Format: Named isoforms=number;
Example: Named isoforms=6;
Comment Any comments concerning one or more isoforms; optional for 'Alternative splicing'; in case of 'Alternative promoter' and 'Alternative initiation' there is always a 'Comment' of free text, which includes relevant information on the isoforms.
Format: Comment=free text;
Example: Comment=Experimental confirmation may be lacking for some isoforms;
Name A common name for an isoform used in the literature or assigned by Swiss-Prot; currenty only available for spliced isoforms.
Format: Name=common name;
Example: Name=Alpha;
Synonyms Synonyms for an isoform as used in the literature; optional; currently only available for spliced isoforms.
Format: Synonyms=Synonym_1[, Synonym_n];
Example: Synonyms=B, KL5;
IsoId Unique identifier for an isoform, consisting of the Swiss-Prot accession number, followed by a dash and a number.
Format: IsoId=acc#-isoform_number[, acc#-isoform_number];
Example: IsoId=P05067-1;
Sequence Information on the isoform sequence; the term 'Displayed' indicates, that the sequence is shown in the entry; a lists of feature identifiers (VSP_#) indicates that the isoform is annotated in the feature table; the FTIds enable programs to create the sequence of a splice variant; if the accession number of the IsoId does not correspond to the accession number of the current entry, this topic contains the term 'External'; 'Not described' points out that the sequence of the isoform is unknown.
Format: Sequence=VSP_#[, VSP_#]|Displayed|External|Not described;
Example: Sequence=Displayed;
Example: Sequence=VSP_000013, VSP_000014; Example: Sequence=External;
Example: Sequence=Not described;
Note Lists isoform-specific information; optional.
Format: Note=Free text;
Example: Note=No experimental confirmation available;

Example of the CC lines and the corresponding FT lines for an entry with alternative splicing Q15746:
...
CC  -!- ALTERNATIVE PRODUCTS:
CC      Event=Alternative splicing; Named isoforms=6;
CC      Name=1;
CC        IsoId=Q15746-4; Sequence=Displayed;
CC      Name=2;
CC        IsoId=Q15746-5; Sequence=VSP_000040;
CC      Name=3A;
CC        IsoId=Q15746-6; Sequence=VSP_000041, VSP_000043; 
CC      Name=3B;
CC        IsoId=Q15746-7; Sequence=VSP_000040, VSP_000041, VSP_000042;
CC      Name=4;
CC        IsoId=Q15746-8; Sequence=VSP_000041, VSP_000042;
CC      Name=del-1790;
CC        IsoId=Q15746-9; Sequence=VSP_000044;
...
FT   VARSPLIC    437    506       VSGIPKPEVAWFLEGTPVRRQEGSIEVYEDAGSHYLCLLKA
FT                                RTRDSGTYSCTASNAQGQVSCSWTLQVER -> G (in
FT                                isoform 2 and isoform 3B).
FT                                /FTId=VSP_004791.
FT   VARSPLIC   1433   1439       DEVEVSD -> MKWRCQT (in isoform 3A,
FT                                isoform 3B and isoform 4).
FT                                /FTId=VSP_004792.
FT   VARSPLIC   1473   1545       Missing (in isoform 4).
FT                                /FTId=VSP_004793.
FT   VARSPLIC   1655   1705       Missing (in isoform 3A and isoform 3B).
FT                                /FTId=VSP_004794.
FT   VARSPLIC   1790   1790       Missing (in isoform Del-1790).
FT                                /FTId=VSP_004795.
 
...
The corresponding modules of the Swiss-Prot parser Swissknife have been modified, and Release 1.31 of Swissknife can be downloaded.
Changes concerning cross-references (DR line)
We have added cross-references to the Gene Ontology (GO) database (available at http://www.geneontology.org/), which provides controlled vocabularies for the description of the molecular function, biological process and cellular component of gene products.

The identifiers of the appropriate DR line are:

Data bank identifier: GO
Primary identifier: GO's unique identifier for a GO term.
Secondary identifier: A 1-letter abbreviation for one of the 3 ontology aspects, separated from the GO term by a column. If the term is longer than 46 characters, the first 43 characters are indicated followed by 3 dots ('...'). The abbreviations for the 3 distinct aspects of the ontology are P (biological Process), F (molecular Function), and C (cellular Component).
Tertiary identifier: 3-character GO evidence code. The meaning of the evidence codes is: IDA=inferred from direct assay, IMP=inferred from mutant phenotype, IGI=inferred from genetic interaction, IPI=inferred from physical interaction, IEP=inferred from expression pattern, TAS=traceable author statement, NAS=non-traceable author statement, IC=inferred by curator, ISS=inferred from sequence or structural similarity.
Examples:
Q9XTD2
DR   GO; GO:0008601; F:protein phosphatase type 2A, regulator acti...; IPI.
DR   GO; GO:0000080; P:G1 phase of mitotic cell cycle; IDA.
DR   GO; GO:0008285; P:negative regulation of cell proliferation; IDA.
DR   GO; GO:0006470; P:protein amino acid dephosphorylation; IDA.

P04406:
DR   GO; GO:0005737; C:cytoplasm; NAS.
DR   GO; GO:0004365; F:glyceraldehyde 3-phosphate dehydrogenase (p...; NAS.
DR   GO; GO:0006096; P:glycolysis; NAS.


Changes concerning keywords (KW line)
New keywords:
Alternative promoter usage
Amphibian defense peptide

Deleted keyword:
Amphibian skin

Release 41.0, 28-Feb-2003

Progress in the conversion of Swiss-Prot to mixed-case characters
We are gradually converting Swiss-Prot entries from all 'UPPER CASE' to 'MiXeD CaSe'. With this release the RC (Reference Comment) line topic STRAIN and the CC line topic 'CATALYTIC ACTIVITY' have been converted.

'Nucleomorph' added to the OrGanelle (OG) line
The OG (OrGanelle) line indicates from which genome a gene for a protein originates. Until now, defined terms in the OG line where 'Chloroplast', 'Cyanelle', 'Mitochondrion' and 'Plasmid'. The term 'Nucleomorph' has been added, which is the residual nucleus of an algal endosymbiont that resides inside its host cell.
Multiple RP lines
Starting with release 41, there can be more than one RP (Reference Position) line per reference in a Swiss-Prot entry. The RP line describes the extent of the work carried out by the authors of the reference, e.g. the type of molecule that has been sequenced, protein characterization, PTM characterization, protein structure analysis, variation detection, etc.

As the number of experimental results per publication has increased over the years, the limitation of using a single RP line per reference no longer allowed to add all the information while maintaining a consistent format. Therefore we decided to permit multiple RP lines.

Example:

RP   SEQUENCE FROM N.A., SEQUENCE OF 23-42 AND 351-365, AND
RP   CHARACTERIZATION.

Changes concerning cross-references (DR line)
Schizosaccharomyces pombe GeneDB Prototype
We have added cross-references to the Schizosaccharomyces pombe GeneDB Prototype (available at http://www.genedb.org/genedb/pombe/index.jsp), which contains all S. pombe known and predicted protein coding genes, pseudogenes and tRNAs. It is hosted by the Sanger Institute.

The identifiers of the appropriate DR line are:

Data bank identifier: GeneDB_SPombe
Primary identifier: GeneDB's unique identifier for a S. pombe gene.
Secondary identifier: None; a dash '-' is stored in that field.
Example:
DR   GeneDB_SPombe; SPAC9E9.12c; -.
Genew
We have added cross-references to the Human Gene Nomenclature Database Genew (available at http://www.gene.ucl.ac.uk/nomenclature/searchgenes.pl), which provides data for all human genes which have approved symbols. It is managed by the HUGO Gene Nomenclature Committee (HGNC).

The identifiers of the appropriate DR line are:

Data bank identifier: Genew
Primary identifier: HGNC's unique identifier for a human gene
Secondary identifier: HGNC's approved gene symbol.
Example:
DR   Genew; HGNC:5217; HSD3B1.
Gramene
We have added cross-references to the Gramene database, a comparative mapping resource for grains (available at http://www.gramene.org/). The format for the explicit links are:

Data bank identifier: Gramene
Primary identifier: Unique identifier for a protein, which is identical to the Swiss-Prot primary AC number of that protein.
Secondary identifier: None; a dash '-' is stored in that field.
Example:
DR   Gramene; Q06967; -.
HAMAP
We have added cross-references to the collection of orthologous microbial protein families, generated manually by expert curators of the HAMAP (High-quality Automated and Manual Annotation of microbial Proteomes) project in the framework of the Swiss-Prot protein knowledgebase. The data is accessible at /sprot/hamap/families.html.

The identifiers of the appropriate DR line are:

Data bank identifier: HAMAP
Primary identifier: HAMAP unique identifier for a microbe protein family
Secondary identifier: The values are either '-', 'fused', 'atypical' or 'atypical/fused'. The value '-' is a placeholder for an empty field; the 'fused' value indicates that the family rule does not cover the entire protein; the value 'atypical' points out that the protein is divergent in sequence or has mutated functional sites, and should not be included in family datasets. The value 'atypical/fused' indicates both latter findings.
Tertiary identifier: Number of domains found in the protein, generally '1', rarely '2' for the fusion of 2 identical domains.
Example:
DR   HAMAP; MF_00012; -; 1.
Phosphorylation Site Database
We have added cross-references to the Phosphorylation Site Database, PhosSite (available at http://vigen.biochem.vt.edu/xpd/xpd.htm), which provides access to information from scientific literature concerning prokaryotic proteins that undergo covalent phosphorylation on the hydroxyl side chains of serine, threonine or tyrosine residues. The identifiers of the appropriate DR line are:

Data bank identifier: PhosSite
Primary identifier: Unique identifier for a phosphoprotein, which is identical to the Swiss-Prot primary AC number of that protein.
Secondary identifier: None; a dash '-' is stored in that field.
Example:
DR   PhosSite; P00955; -.
TIGRFAMs
We have added cross-references to TIGRFAMs, a protein family database available at http://www.tigr.org/TIGRFAMs/. The identifiers of the appropriate DR line are:

Data bank identifier: TIGRFAMs
Primary identifier: TIGRFAMs unique identifier for a protein family.
Secondary identifier: TIGRFAMs entry name for a protein family.
Tertiary identifier: Number of hits found in the sequence.
Example:
DR   TIGRFAMs; TIGR00630; uvra; 1.
CarbBank
We have removed the Swiss-Prot cross-references to CarbBank.
GCRDb
We have removed the Swiss-Prot cross-references to GCRDb.
Mendel
We have removed the Swiss-Prot cross-references to Mendel.
YEPD
We have removed the Swiss-Prot cross-references to the yeast electrophoresis protein database (YEPD).

Explicit links to dbSNP in FT VARIANT lines of human sequence entries
In human protein sequence entries we have introduced explicit links to the Single Nucleotide Polymorphism database (dbSNP) from the feature description of FT VARIANT keys. The format of such links is:
FT   VARIANT    from     to	  description (IN dbSNP:accession_number).
FT                                /FTId=VAR_number.
Example:
FT   VARIANT      65     65       T -> I (IN dbSNP:1065419).
FT                                /FTId=VAR_012009.
Feature key 'SIMILAR' became obsolete
The feature key 'SIMILAR' was used to describe the extent of a similarity with another protein sequence. Nowadays, most domains with similarity to other proteins are known regions described in domain and family databases, which are annotated in Swiss-Prot with the feature key 'DOMAIN' or 'REPEAT' and the comment (CC) line topic 'SIMILARITY'; thus the feature key 'SIMILAR' became obsolete and will not be used again.
Version of SP in XML format
A distribution version of Swiss-Prot and TrEMBL in XML format is being developed. The first draft of the XML specification was released for public review on February 21, 2002.

For more information see http://www.ebi.ac.uk/swissprot/SP-ML/.

Please send comments and suggestions by electronic mail to sp-ml@ebi.ac.uk.



ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
Hosted by NCSC USMirror sites:Canada China Korea Switzerland Taiwan