CASP6 Contact Prediction Evaluation


Evaluation Rules


The distance between two residues is calculated as the distance in Angstroms between their Cb carbons (Ca for Gly). For a given set of predicted contacts, five main parameters are calculated.


  • Accuracy (Acc). The relation between the number of correctly predicted contacts and the total number of predicted contacts.
                                                             Acc= correctly predicted contacts / predicted contacts

  • Improvement over random (Imp). The relation between the accuracy and the accuracy of random (predicting all the pairs in the protein as contacting).
                                                             Imp= Acc / AccRand

** AccRand= C / N

Where N is the total number of residue pairs in the protein excluding the ones close in the sequence (see below) and C is the observed number of contacts within N.


  • Coverage (Cov). The relation between the number of correctly predicted contacts and the number of observed contacts in the experimental structure.
                Cov= correctly predicted contacts / experimental contacts
 
                                                                      Xd= SUM {i=1,15}((Pip-Pia) / (di * 15))
Where the sum runs for all the distance bins. There are 15 distance bins covering the range from 0 to 60 A. di is the distance representing each bin, its upper limit (normalised to 60). Pip is the percentage of predicted pairs whose distance is included in the i bin. Pia is the same for all the pairs. Defined in that way, Xd>0 indicates the positive cases where the population of predicted contacts distances is shifted to lower distances (see J. Mol. Biol. (1997), 271:511-523).


 
For the calculation of these parameters, both, the predicted pairs of residues and all the pairs in the protein are split in three sets according to the separation of the two residues of the pair in the linear sequence of the protein, the number of residues between them: seqsep>=6, seqsep>=12 and seqsep>=24. Acc, Imp, Cov and Xd are evaluated for these three sets. The main parameter for the evaluation is the highest sequence separation (seqsep>=24).


The list of the predicted contacts is sorted by the assigned score and evaluations are made taken different numbers of top pairs as function of the protein length: the first L/10, L/5, L/2, L and 2L pairs are taken (L or Len: number of residues of the protein). For 3D prediction servers that do not have values of reliability associated to each pairwise distance, we sample an equivalent number of predicted pairs. All those calculations are performed for the three subsets of pairs (seqsep>=6, 12 and 24).