Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

SCOP

http://scop.mrc-lmb.cam.ac.uk/scop

Lo Conte, L.1, Brenner, S.E.2, Hubbard, T.J.P.3, Chothia, C.1, Murzin, A.G.4

1MRC Laboratory of Molecular Biology, Structural Studies Division, Hills Road, Cambridge CB2 2QH, UK
2Berkeley Structural Genomics Center, Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA, and Department of Plant and Microbial Biology, University of California, Berkeley, CA, 94720-3102, USA
3Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
4MRC Centre for Protein Engineering, Hills Road, Cambridge CB2 2QH, UK

Contact   loredana@mrc-lmb.cam.ac.uk


Database Description

The SCOP: Structural Classification of Proteins database [1] is a comprehensive ordering of all proteins of known structures, according to their structural and evolutionary relationships. Its peculiarity rests upon the fact that, despite a notable amount of computer support, it embeds a theory of evolution as defined by a human expert, rather than the necessarily more limited set of rules implemented by a series of algorithms and automatic tools. The first SCOP release in 1994 included 3179 domains clustered into 498 families, 366 superfamilies, and 279 different folds. The first seven classes in the current SCOP release (1.55) comprise 30,403 domains, grouped into 605 different folds, 947 superfamilies, and 1,557 families. These domains correspond to 12,794 entries in the Protein Data Bank (PDB) [2] and 39 references from the literature, for which the experimental coordinates are not available. The ten-fold increase in the number of domains since 1994 roughly means twice as many folds, 2.5 times as many superfamilies, and three times as many families. The growth in content of the SCOP database closely tracks the corresponding growth in the number of deposited structures. In the last two years SCOP has been released every 4-6 months, and included the classification of all proteins whose coordinates were available in the PDB at the time of the release. Recently, we introduced a set of new features in SCOP, with the aim of standardizing access to it and providing a solid basis to cope with the increasing number of protein structures expected to be determined in the years to come in the context of various structural genomics projects. These new features include: 1) A new set of identifiers which uniquely identify each entry in the SCOP hierarchy, from root to leaves. 2) A compact representation of a SCOP domain classification, including only the most relevant levels: class, fold, superfamily, and family. 3) A new set of parseable files, which fully describe all domains in SCOP, and the SCOP hierarchy itself. The new identifiers provide an unambigous way to identify and link to a SCOP entry, and to refer to SCOP in the research work based on it and in the literature. The set of identifiers, classification strings, and parseable files have been designed to accomodate future changes in SCOP without breaking software, provided ithey are properly used. A new set of genetic domains sequences, and a manually curated mapping between SEQRES and ATOM fields for all PDB chains in the first seven classes in SCOP are also available at the ASTRAL [3] web site. The use of these reference data sets, together with the new set of identifiers and parseable files, guarantees an easy access to the information stored in SCOP, and will make comparison, linking, and integration of SCOP-based or related results a trivial task. The purpose is to develop a common language that we can use without ambiguities when talking about a SCOP domain and its classification, and to avoid duplication of efforts. Finally, a new set of links to external resources have been added at the level of SCOP domains. For each domain in the first seven classes, there are links to extra information related to that domain in Pfam [4], SUPERFAMILY [5], PartsList [6], and, in case there is one or more sequences predicted to have that fold, to PRESAGE [7]. The same mechanism can be used for the integration of relevant biological information at all levels of the hierarchy, providing a structural point of view towards the sequence world, and attributes about function, protein-protein and protein-DNA interaction, metabolic pathways, and cellular role.

Recent Developments

New features include: 1) A new set of identifiers which uniquely identify each entry in the SCOP hierarchy, from root to leaves. 2) A compact representation of a SCOP domain classification, including only the most relevant levels: class, fold, superfamily, and family. 3) A new set of parseable files, which fully describe all domains in SCOP, and the SCOP hierarchy itself.

REFERENCES

  1. Murzin A. G., Brenner S. E., Hubbard T., Chothia C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536-540.
  2. Berman, H.M. ,Westbrook, J., Feng, Z.,Gilliland, G.,Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne:, P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235-242.
  3. Brenner S.E., Koehl P., Levitt M. (2000) The ASTRAL compendium for sequence and structure analysis. Nucleic Acids Res., 28, 254-256.
  4. Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L., Sonnhammer, E.L. (2000) The Pfam Protein Families Database. Nucleic Acids Res., 28, 263-266.
  5. Gough, J., Karplus, K., Hughey, R.,and Chothia, C. (2001) Assignment of Homology to Genome Sequences using a Library of Hidden Markov Models that Represent all Proteins of Known Structure. J. Mol. Biol. In press.
  6. Qian, J. , Stenger, B. Wilson,, C.A., Lin, J., Jansen, R., Teichmann, S.A., Park,J., Krebs, W.G.,Yu, H., Alexandrov, V. Echols, N., Gerstein, M.(2001) PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information. Nucleic Acids Res., 29, 1750-64.
  7. Brenner S.E., Barken D., Levitt M. (1999) The PRESAGE database for structural genomics. Nucleic Acids Res., 27, 251-253.

Category   Structure

Go to the abstract in the NAR 2002 Database Issue.

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers