Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

SUPFAM

http://pauling.mbu.iisc.ernet.in/~supfam

Pandit, S.B.1, Gosar, D.1, Abhiman, S.2, Sujatha, S.1, Dixit, S.S.3, Mhatre, N.S.1, Sowdhamini, R.2, Srinivasan, N.1

1Molecular Biophysics Unit Indian Institute of Science Bangalore 560 012 India
2National Centre for Biological Sciences UAS-GKVK campus Bangalore 560 065 India
3Biotechnology Centre Indian Institute of Technology - Bombay Powai Mumbai 400 076 India

Contact   ns@mbu.iisc.ernet.in


Database Description

Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional structures of the proteins concerned are determined using X-ray analysis or NMR. The SUPFAM database described here relates two homologous protein families in a multiple sequence alignment database of either known or unknown structure. The present release (1.1) which is the first version of the SUPFAM database has been derived by analysing Pfam which is one of the commonly used database of multiple sequence alignments of homologous proteins. The first step in establishing SUPFAM is to relate Pfam families with the families in PALI which is an alignment database of homologous proteins of known structure that is derived largely from SCOP. The second step involves relating Pfam families which could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, IMPALA, has been used in these steps. The first step resulted in identification of 1280 Pfam families (out of 2697 - 47%) which are related, either by close homologous connection, to a SCOP family or by distant relationship to a SCOP family potentially forming new superfamily connections. Using the profiles of 1417 Pfam families with apparently no structural information an all-against-all comparison involving sequence-profile match using IMPALA resulted in clustering of 67 homologous protein families of Pfam in to 28 potential new superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying 'priority proteins' for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. For example we could assign 858 distinct Pfam domains in 2203 of the gene products in the genome of Mycobacterium tubercolosis. 51 of these Pfam families of unknown structure could be clustered into 17 potentially new superfamilies forming good targets for structural genomics. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/~supfam.

Acknowledgements

This work is supported by the Wellcome Trust, London in the form of a Senior Fellowship to NS.

REFERENCES

  1. Brenner, S.E. and Levitt, M. (2000) Expectations of structural genomics. Protein Sci., 9, 197-200.
  2. Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536-540.
  3. Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L. and Sonnhammer, E.L.L. (2000) Nucleic Acids Res., 28, 263-266.
  4. Schaffer, A.A., Wolf, Y.I., Ponting, C.P., Koonin, E.V., Aravind, L. and Altschul, S.F. (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics, 15, 1000-1011.

Category   Protein Sequence Motifs

Go to the abstract in the NAR 2002 Database Issue.

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers