The PROSITE database of protein families and domains
Release 17, December 2001
2 Description of the changes made to PROSITE since release 16.0
3 Forthcoming changes
4 Status of the PROSITE files
5 FTP access to PROSITE
From release 17.0 onwards the PROSITE database will be distributed apart from
the SWISS-PROT release. This release of PROSITE contains 1,108 documentation
entries describing 1,501 different patterns, rules or profiles/matrices.
Since release 16.0, 127 entries have been added and 250 entries have been updated.
The following table shows the growth of the database since its creation in 1989.
|1.0||03/89||58||60||Only released in PC/Gene (Version 5.16)|
|2.0||03/89||129||132||Only released in PC/Gene (Version 6.00)|
|3.0||05/89||? ||160|| |
|4.0||10/89||? ||202||Printed release (EMBL Biocomputing document)|
|5.0||04/90||296 ||338|| |
|7.0||05/91||441 ||508 || |
|8.0||11/91||530||605 || |
|9.0||06/91||580 ||689 || |
|10.0||12/92||635 ||803 || |
|11.0||10/93||715 ||927 || |
|12.0||06/94||785 ||1029||First release to include profiles|
|15.0||06/98||1014 ||1352||  |
|16.0 ||07/99||1034||1374||  |
(2) Description of the changes made to PROSITE
since release 16.0
We have introduced weekly updates of PROSITE, which are available for FTP download from the directory:
2.1 Weekly update of PROSITE
We are now distributing a program (ps_scan) that allows to scan a sequence against
all PROSITE patterns, profiles and rules.
2.2 Distribution of a reference tool to scan PROSITE
ps_scan is a perl program used to scan one or several patterns, rules and/or
profiles from PROSITE against one or several protein sequences in SWISS-PROT or
FASTA format. It requires two external compiled programs from the PFTOOLS
package "pfscan" and "psa2msa".
We introduced five new qualifiers in the CC line of PROSITE matrix entries.
2.3 Introduction of new CC qualifiers
This qualifier describes the region in the protein identified by the profile. Example:
2.3.1 The /MATRIX_TYPE qualifier
The matrix type can be: protein_domain, repeat_region,
localization_signal or composition.
|   Protein_domain  ||Describes a profile directed against a conserved
region of a protein.|
|   Repeat_region  ||Describes a profile directed against a run of repeat units.|
|   Localization_signal  ||Describes a profile directed against a
region important for the localization of
the protein in the cell.|
|   Composition  ||Describes a profile directed against a region of low
complexity or enriched in a given amino acid.|
This qualifier indicates which database was used to calibrate the profile. Example:
2.3.2 The /SCALING_DB qualifier
Scaling databases currently used are:
|   reversed  || Is a protein database, randomized by taking the reverse
sequence of each individual entry.|
|   window20  || Is a protein database, locally shuffled in windows of
20 residues. |
|   window20_shuffled  || Is a small version of a window20 protein database. |
|   db_global  || Is a protein database, globally shuffled in windows of
20 residues. |
This qualifier is used to indicate the author that created or updated the profile. Example:
2.3.3 The /AUTHOR qualifier
CC /AUTHOR=K_Hofmann, P_Bucher;
The first name is the author of the profile, the second one the author of the last update.
These qualifiers are used to give a computer readable short description of the region identified by the profile.
They are based on the SWISS-PROT Feature Table key and Feature Table description currently used to define
the region identified by the profile. Example:
2.3.4 The /FT_KEY and /FT_DESC qualifiers
CC /FT_KEY=DOMAIN; /FT_DESC=KRINGLE.
FT_KEY can be NP_BIND, MOTIF, DOMAIN, REPEAT, DNA_BIND or ZN_FING.
More details can be found on feature keys and feature descriptions in the SWISS-PROT user manual.
3.1 Introduction of PDB accession number in the text of PDOC
We plan to introduce PDB accession number in the text of PROSITE documentation.
The format is indicated by the following example:
SWISS-PROT has plans to elongate the mnemonic code for the protein name from up
to 4 characters to up to 5 characters. E.g. the mnemonic code for the
meiotic recombination protein rec10 is currently 'RE10'. After the
introduction of extended entry names it could be modified to the 5-letter
3.2 Extension of the DR line length to 76 characters
This SWISS-PROT modification will introduce a change in the size of PROSITE DR lines.
As soon as SWISS-PROT introduces the 5-letter code in ID lines, we will extend PROSITE DR
lines to 76 characters.
(4) Status of the PROSITE files
PROSITE is distributed with different data and documentation files. The following table
lists the files that are currently available.
||Description of the profile syntax
||Release notes for the current release (17)
||Patterns, profiles and rules databases (updated weekly)
||Documentation database for each pattern and profile (updated weekly)
||List of documentation entries (updated weekly)
||List of on-line experts for PROSITE and SWISS-PROT (updated weekly)
||List of cited journals in PROSITE (updated weekly )
||Authors index (updated weekly)
||Announcement concerning PROSITE
Two files are no longer distributed:
We have continued to include in some PROSITE documentation entries the
references of Web sites relevant to the subject under consideration. There
are now 62 documents that include such links.
- prosite.prg: As described above (see: 2.2 )
we are distributing a reference program to scan PROSITE patterns, profiles and rules.
- prosite.get: Instructions on how to obtain PROSITE are now
integrated in these release notes.
(5) FTP access to PROSITE
PROSITE is available for download on the following anonymous FTP servers:
This release of PROSITE has been prepared by:
Amos Bairoch (1), Philipp Bucher (2), Laurent Falquet (2),
Elisabeth Gasteiger (1), Alain Gateau (1), Alexandre Gattiker (1),
Nicolas Hulo (1), Marco Pagni (2) and Christian Sigrist (1).
(1) Swiss Institute for Bioinformatics, Geneva, Switzerland;
(2) Swiss Institute for Bioinformatics, Lausanne, Switzerland.