PALI

Compilation Paper

Categories List

Alphabetical List

Search Summary Papers

PALI

http://pauling.mbu.iisc.ernet.in/~pali

Balaji, S.¹, Sujatha, S.¹, Aruna, S.², Mhatre, S.N.¹, Srinivasan, N.¹

¹Molecular Biophysics Unit Indian Institute of Science Bangalore 560 012 India
²Centre for Biotechnology Anna University Chennai 600 025 India

Contact ns@mbu.iisc.ernet.in

Database Description

PALI (Release - 1.3) (http://pauling.mbu.iisc.ernet.in/~pali) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of homologous proteins in various families. The data set of homologous protein structures has been derived by consulting the SCOP database. The present release (1.3) comprises of 614 families of homologous proteins involving 3050 protein domain structures with each family made-up of at least two members and nearly 17000 structural alignments. There is a substantial increase in the number of alignments compared to the previous release of PALI which contained about 9000 alignments. Every member in a family has been structurally aligned with every other member in the same family (pairwise alignment) and all the members in a family are also aligned using simultaneous superposition (multiple alignment). The structural alignments are performed using the program STAMP in a semi-automated way. Every family is also associated with two dendrograms, calculated using PHYLIP, one based on a structural dissimilarity metric defined for every pairwise alignment and the other based on the similarity of topologically equivalenced residues. The present release also includes the structural distance metric for each pair as defined by Gerstein and Levitt. Readily available alignments with the details of structural and sequence similarities, superposed coordinate sets and dendrograms can be accessed family-wise. Querying the database for protein pairs with sequence or structural similarities falling within a specified range can also enable accessing the families, alignments and dendrograms. Thus PALI forms a useful resource to help in analysing the relationship between sequence and structure variation at a given level of sequence similarity. PALI also contains about 650 "orphans" (single member families). Using a web-interface involving PSI_BLAST and KITSCH it is possible to associate the sequence of a new protein in to one of the families in PALI and automatically generate a phylogenetic tree combining the query sequence and proteins of known 3-D structure. Another new feature of PALI that is available with the present release is an interface to IMPALA program which matches the query sequence with the profiles (Position Specific Score Matrix - PSSM) of families in PALI.

Recent Developments

1. Significant increase in the size of the database in terms of number protein domains and number of structure-based alignments. 2. Availability of structural divergence measure in the form of the metric proposed by Levitt and Gerstein for all the pairs of proteins in PALI. 3. Enhanced capability to match the query sequence with the profiles of PALI families using the profile-matching technique, IMPALA.

REFERENCES

Levitt, M. and Gerstein, M. (1998) A unified statistical framework for sequence comparison and structure comparison. Proc. Natl. Acad. Sci. USA, 95, 5913-5920.
Schaffer, A.A., Wolf, Y.I., Ponting, C.P., Koonin, E.V., Aravind, L. and Altschul, S.F. (1999) IMPALA: Matching a sequence against a collection of PSI_BLAST constructed position-specific score matrices. Bioinformatics, 15, 1000-1011.

Category Structure

Go to the abstract in the NAR 2001 Database Issue.