EVA home   EVA e-mail   EVA mirrors   -   Secondary structure   Comparative modelling   Threading   Contacts
 
Version
Sep 6, 2000
 
email

EVA: Contact Prediction. Evaluation Criteria.

The distance between two residues will be calculated as the distance in A between their Cb carbons (Ca for Gly). For a given set of predicted contacts, three main parameters will be calculated.

  • Contact evaluation. Contact is defined as distance between Cb(Gly,Ca) <= 8.0 A.

    • Accuracy (Acc). The relation between the number of true predicted contacts and the total number of predicted contacts.

      Acc= nt/n

      Where, nt is the number of true predicted contacts and n is the total number of predicted contacts.

    • Improvement over random (Imp). The relation between the accuracy and the accuracy of random (predicting all the pairs in the protein as contacting).

      Imp= Acc/(C/N)

      Where N is the total number of residue pairs in the protein excluding the ones close in the sequence (see below) and C is the observed number of contacts within N.

  • Distance distribution of the predicted contacts (Xd). The weighted harmonic average difference between the predicted contacts distance distribution and the all-pairs distance distribution.

    Xd= SUM{i=1,15}((Pip-Pia)/(di 15))

    Where the sum runs for all the distance bins. There are 15 distance bins covering the range from 0 to 60 A. di is the distance representing each bin, its upper limit (normalised to 60) . Pip is the percentage of predicted pairs whose distance is included in the i bin. Pia is the same for all the pairs. Defined in that way, Xd>0 indicates the positive cases where the population of predicted contacts distances is shifted to lower distances (see J Mol Biol (1997), 271:511-523).

For the calculation of the three parameters both, the predicted pairs of residues and all the pairs in the protein are split in three sets according with the separation of the two residues of the pair in the linear sequence of the protein, the number of residues between them: seqsep>=6, seqsep>=12 and seqsep>=24. Acc, Imp and Xd are evaluated for these three sets.

Predictors can submit a number of residue pairs as the ones predicted to be in contact or can send all the pairs in the protein with an associated score for each pair (see file format). In the first case the coverage of the prediction is also calculated as the relation between the number of predicted pairs and the total number of possible pairs. In the second case, the list will be sorted by the score and evaluations will be made taken different numbers of top pairs as function of the protein length: the first 2L, L, L/2, L/5 and L/10 pairs will be taken (L: length of the protein). All those calculations are performed for the three subset of pairs explained above (seqsep>=6, 12 and 24).

Right now, the results shown in EVA_con correspond to contacts at high sequence separation (seqsep>=24) and taking the L/2, L/5 and L/10 best contacts.
In order to express the accuracy of a given method for a given protein in only one value, the average among the Xd values of the L/2, L/5 and L/10 sets of contacts is calculated (seqsep>=24):

AvXd= (Xd_L2 + Xd_L5 + Xd_L10) / 3


EVA home   EVA e-mail   EVA mirrors   -   EVA mirrors   -   Secondary structure   Comparative modelling   Threading   Contacts