Contact Prediction Evaluation for CASP4/CAFASP2. Help


[Evaluation Criteria] [File Format] [Testing the Server] [Automatic HTTP calls]
[CASP4] [CAFASP2]
[Protein Design Group @ CNB-CSIC] [Contact]


Evaluation Criteria

The distance between two residues will be calculated as the distance in A between their Cb carbons (Ca for Gly). For a given set of predicted contacts, three main parameters will be calculated.

For the calculation of the three parameters both, the predicted pairs of residues and all the pairs in the protein are split in three sets according with the separation of the two residues of the pair in the linear sequence of the protein, the number of residues between them: seqsep>=6, seqsep>=12 and seqsep>=24. Acc, Imp and Xd are evaluated for these three sets.

Predictors can submit a number of residue pairs as the ones predicted to be in contact or can send all the pairs in the protein with an associated score for each pair (see file format). In the first case the coverage of the prediction is also calculated as the relation between the number of predicted pairs and the total number of possible pairs. In the second case, the list will be sorted by the score and evaluations will be made taken different numbers of top pairs as function of the protein length: the first 2L, L, L/2, L/5 and L/10 pairs will be taken (L: length of the protein). All those calculations are performed for the three subset of pairs explained above (seqsep>=6, 12 and 24).

Targets for Contact Prediction will be split in different sets according with their sequence length.

The fundamental parameter for the evaluation will be Xd at high sequence separation (seqsep>=24).


File format for submissions

Any line starting with a dash (#) will be ignored when parsing the file. The sequence of the protein should be written in lines starting with the SEQUNC label. Valid symbols here are the 20 aminoacid names (1 letter code) plus the 'X'.
Four lines starting with the labels AUTHNM, AUTHID, TARGID and PREDN should be included anywhere in the file containing the name of the predictor, his/her registration code, the target ID and the prediction number of that author for that target respectively. Each predictor can send more than one prediction for a given target using different files with different PREDN numbers. If two files contain the same values for AUTHID, TARGID and PREDN, the second file will overwrite the first one. Each pair of residues is written in a different line in the file. The line should contain, in this order, the sequence numbers of the two residues (numbers as in CASP template PDB file; the one with the lower sequence number in first place), their respective aminoacid type (in columns 13 and 15), and optionally, an score proportional to the strength/confidence of that prediction. The format of such al line line in C/Perl string formatting code is:

%5d %5d %c %c %8.3f\n

Example:

################################
#                              #
#  Contact Prediction example  #
#                              #
################################
SEQUNC AVAFILSTENDVGPSQGSYS
SEQUNC DLRVVGSLDGQSIYGLTEEV
SEQUNC SVHVRPVILKRNSSAQYSVQ
SEQUNC STHAMDNLPFVYNTGYLKRN
SEQUNC TAMSGNSWENVFSGWCVGND
SEQUNC NIGYQHANVW
#
AUTHNM John Smith
AUTHID XXXXXXXX
TARGID T00XX
PREDN  1
#
    3     6 A L      0.212
  100   106 D H      0.934
   23   100 R D      1.054
    3   100 A D      0.435

Testing the server.


Contact us

Florencio Pazos & Alfonso Valencia
Protein Design Group
National Center for Biotechnology (Spanish Research Council)
Campus Universidad Autonoma.
Cantoblanco. 28049 Madrid.
Tlf: +34-91-5854669. Fax: +34-91-5854506.