PDB-REPRDB

http://www.cbrc.jp/papia/

Noguchi, T.¹, Mikoshiba, M.², Takahashi, A.², Akiyama, Y.¹

¹Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-41-6 Aomi, Koto-ku, Tokyo 135-0064, JAPAN
²Information and Mathematical Science Laboratory, Inc., IKEBUKURO AOYAGI Bldg., 2-43-1 Ikebukuro, Toshima-ku, Tokyo 171-0014, JAPAN

Database Description

PDB-REPRDB is a database of representative protein chains from the Protein Data Bank (PDB). Criteria used to select representatives are: a) quality of atomic co-ordinate data, b) sequence uniqueness, and c) conformation uniqueness. The system of PDB-REPRDB is designed so that the user may obtain a quick selection of representative chains from PDB. System operation can be divided into two stages: 1) calculation of similarities between all pairs of protein chains, 2) classification of those chains and selection of representative chains according to priorities specified by the user. Similarities are calculated beforehand, and selection of representative chains can be dynamically configured according to the user's requirement. The WWW interface provides a large degree of freedom in setting parameters, such as cut-off scores of sequence and structural similarity. Users can eliminate unnecessary chains from the PDB chain list by setting threshold values and can also change priority of nine factors: resolution, R-factor, number of chain breaks, ratio of non-standard amino acid residues, ratio of residues with only Ca co-ordinates, ratio of residues with only backbone co-ordinates, number of residues, whether mutant or wild, and whether complex or not. Moreover, users can select whether or not to include entries by NMR experimental techniques by setting a flag of NMR. One can obtain a representative list and classification data of protein chains from the system. The representative list includes information about factors mentioned above, EC number, and compound in PDB. The 'ID' sections are hyper-linked to data on classified groups, and a graphic representation of three-dimensional structure can be displayed using the RasMol program by clicking on '*'. Furthermore, 'ECnumber' sections are hyper-linked to the Ligand chemical database for enzyme reactions (LIGAND). The current database includes 26,454 protein chains from 15,769 PDB entries (10 August, 2001), from which are excluded (a) DNA and RNA data, (b) theoretically modeled data, (c) short chains (l<40 residues), and (d) data with non-standard amino acid residues at all residues. The numbers of representative chains selected on several pairs of sequence and structural similarity parameters are shown in a table at the sample page of PDB-REPRDB. The system is available at the new PAPIA (Parallel Protein Information Analysis system) WWW server (http://www.cbrc.jp/papia/).

Recent Developments

The system was moved from the Parallel Application TRC Lab., RWCP, to the Computational Biology Research Center (CBRC), AIST in April 2001. PDB-REPRDB would be updated once per month because the PC cluster including 1,024 CPU can shorten greatly the calculation (i.e. calculation of similarity between protein chains) processing time of the system along with system movement.

Acknowledgements

We thank Dr. Susumu Goto and Prof. Minoru Kanehisa at the Institute for Chemical Research, Kyoto University, for their support.

REFERENCES

Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235-242.
Noguchi,T., Onizuka,K., Ando,M., Matsuda,H. and Akiyama,Y. (2000) Quick selection of representative protein chain sets based on customizable requirements. Bioinformatics, 16, 520-526.
Noguchi,T., Matsuda,H. and Akiyama,Y. (2001) PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB). Nucleic Acids Res., 29, 219-220.
Goto,S., Nishioka,T. and Kanehisa,M. (1999) LIGAND database for enzymes, compounds and reactions. Nucleic Acids Res., 27, 377-379.

Category Structure

Go to the abstract in the NAR 2001 Database Issue.