Lorenzo Segovia and Alejandro Sánchez Flores

Departamento de Ingeniería Celular y Biocatálisis, Ave. Universidad 2001, Col. Chamilpa, Morelos, 62210 Instituto de Biotecnología, UNAM.
lorenzo@ibt.unam.mx

Genomics have produced a vast number of new protein sequences of unknown structure generating the demand for rapid and accurate techniques to infer their probable fold. There are sequences that have similar folds but no detectable primary sequence similarity. One goal of fold recognition methods is to find a way to identify the correct fold under these circumstances or to identify those sequences, which have novel folds. There are many approaches to predict the tertiary structure from the primary protein sequence, based on sequence-to-sequence or sequence-to-structure comparison. However, the challenge to develop more accurate prediction methods, as well as the full automation of these techniques is still compelling. We have developed a new method, which we have called FASE, to identify and group protein families sharing the same fold, based on the comparison of entropy profiles derived from multiple alignments of homologous proteins. An important difference between FASE and other profile based methods is that it does not use sequence profiles per se but only the entropy values derived from them; it is thus capable of recognizing folds that would have no sequence similarity at all when other profile based methods would not. FASE allows us to relate families of homologous proteins of unknown structure as it is in itself independent of any structural information and as such can also be used as a tool to cluster sequences with putative new folds. We have searched for new fold families in the E. coli proteome using sequence-to-sequence search strategies with PSI-BLAST, fold recognition by threading with Threader 3, and fold recognition with FASE. We will present the results we have obtained with each strategy and compare its results.