[Evaluation
Criteria] [File
Format] [Testing
the Server] [Automatic
HTTP calls]
[CASP4]
[CAFASP2]
[Protein
Design Group @ CNB-CSIC] [Contact]
The distance between two residues will be calculated as the distance in A between their Cb carbons (Ca for Gly). For a given set of predicted contacts, three main parameters will be calculated.
Contact evaluation. Contact is defined as distance between Cb(Gly,Ca) <= 8.0 A.
Accuracy (Acc). The relation between the number of true predicted contacts and the total number of predicted contacts.
Acc= nt/n
Where, nt is the number of true predicted contacts and n is the total number of predicted contacts.
Improvement over random (Imp). The relation between the accuracy and the accuracy of random (predicting all the pairs in the protein as contacting).
Imp= Acc/(C/N)
Where N is the total number of residue pairs in the protein excluding the ones close in the sequence (see below) and C is the observed number of contacts within N.
Distance distribution of the predicted contacts. Xd. The weighted harmonic average difference between the predicted contacts distance distribution and the all-pairs distance distribution.
Xd= SUM{i=1,15}((Pip-Pia)/(di 15))
Where the sum runs for all the distance bins. There are 15 distance bins covering the range from 0 to 60 A. di is the distance representing each bin, its upper limit (normalised to 60) . Pip is the percentage of predicted pairs whose distance is included in the i bin. Pia is the same for all the pairs. Defined in that way, Xd>0 indicates the positive cases where the population of predicted contacts distances is shifted to lower distances (see J Mol Biol (1997), 271:511-523).
For the calculation of the three parameters both, the predicted pairs of residues and all the pairs in the protein are split in three sets according with the separation of the two residues of the pair in the linear sequence of the protein, the number of residues between them: seqsep>=6, seqsep>=12 and seqsep>=24. Acc, Imp and Xd are evaluated for these three sets.
Predictors can submit a number of residue pairs as the ones predicted to be in contact or can send all the pairs in the protein with an associated score for each pair (see file format). In the first case the coverage of the prediction is also calculated as the relation between the number of predicted pairs and the total number of possible pairs. In the second case, the list will be sorted by the score and evaluations will be made taken different numbers of top pairs as function of the protein length: the first 2L, L, L/2, L/5 and L/10 pairs will be taken (L: length of the protein). All those calculations are performed for the three subset of pairs explained above (seqsep>=6, 12 and 24).
Targets for Contact Prediction will be split in different sets according with their sequence length.
The fundamental parameter for the evaluation will be Xd at high sequence separation (seqsep>=24).
Any line starting with a dash (#) will be ignored when parsing the
file. The sequence of the protein should be written in lines starting
with the SEQUNC label. Valid symbols here are the 20 aminoacid
names (1 letter code) plus the 'X'.
Four lines starting with the labels
AUTHNM, AUTHID, TARGID and PREDN should be included
anywhere in the file containing the name of the predictor, his/her
registration code, the target ID and the prediction number of that
author for that target respectively. Each predictor can send more
than one prediction for a given target using different files with
different PREDN numbers. If two files contain the same
values for AUTHID, TARGID and PREDN, the second
file will overwrite the first one. Each pair of residues is written
in a different line in the file. The line should contain, in this
order, the sequence numbers of the two residues (numbers as
in CASP template PDB file; the one with the
lower sequence number in first place), their respective aminoacid
type (in columns 13 and 15), and optionally, an score proportional to the
strength/confidence of that prediction. The format of such al line
line in C/Perl string formatting code is:
%5d %5d %c %c %8.3f\n
Example:
################################
# #
# Contact Prediction example #
# #
################################
SEQUNC AVAFILSTENDVGPSQGSYS
SEQUNC DLRVVGSLDGQSIYGLTEEV
SEQUNC SVHVRPVILKRNSSAQYSVQ
SEQUNC STHAMDNLPFVYNTGYLKRN
SEQUNC TAMSGNSWENVFSGWCVGND
SEQUNC NIGYQHANVW
#
AUTHNM John Smith
AUTHID XXXXXXXX
TARGID T00XX
PREDN 1
#
3 6 A L 0.212
100 106 D H 0.934
23 100 R D 1.054
3 100 A D 0.435
|
TARGID 5p21 TARGID 2hbeA
TARGID [Test_] TARGID [AnotherTestChainA]
################################
# #
# Contact Prediction example #
# with included PDB #
# #
################################
SEQUNC AVAFILSTENDVGPSQGSYS
SEQUNC DLRVVGSLDGQSIYGLTEEV
SEQUNC SVHVRPVILKRNSSAQYSVQ
SEQUNC STHAMDNLPFVYNTGYLKRN
SEQUNC TAMSGNSWENVFSGWCVGND
SEQUNC NIGYQHANVW
#
AUTHNM John Smith
AUTHID XXXXXXXX
TARGID [MyProtein_]
PREDN 1
#
3 6 A L 0.212
100 106 D H 0.934
23 100 R D 1.054
3 100 A D 0.435
ATOM 1 N MET 1 -7.186 32.862 -6.632 1.00 13.92
ATOM 2 CA MET 1 -5.921 32.127 -6.684 1.00 17.10
ATOM 3 C MET 1 -5.845 31.045 -5.621 1.00 16.74
ATOM 4 O MET 1 -6.600 31.083 -4.636 1.00 16.60
ATOM 5 CB MET 1 -4.673 33.058 -6.502 1.00 17.00
ATOM 6 CG MET 1 -4.633 33.804 -5.188 1.00 18.08
ATOM 7 SD MET 1 -3.115 34.766 -4.908 1.00 19.73
ATOM 8 CE MET 1 -3.019 35.851 -6.310 1.00 20.41
ATOM 9 N THR 2 -4.910 30.095 -5.820 1.00 14.70
ATOM 10 CA THR 2 -4.661 29.030 -4.859 1.00 16.92
ATOM 11 C THR 2 -4.230 29.655 -3.547 1.00 14.78
ATOM 12 O THR 2 -3.338 30.501 -3.537 1.00 13.93
ATOM 13 CB THR 2 -3.557 28.107 -5.401 1.00 18.03
ATOM 14 OG1 THR 2 -4.059 27.565 -6.629 1.00 20.42
ATOM 15 CG2 THR 2 -3.193 26.983 -4.443 1.00 18.51
ATOM 16 N GLU 3 -4.911 29.290 -2.472 1.00 14.80
ATOM 17 CA GLU 3 -4.504 29.730 -1.135 1.00 17.71
ATOM 18 C GLU 3 -4.012 28.553 -0.278 1.00 17.33
ATOM 19 O GLU 3 -4.520 27.427 -0.395 1.00 17.54
ATOM 20 CB GLU 3 -5.655 30.402 -0.432 1.00 20.67
....................
|
Florencio
Pazos & Alfonso Valencia
Protein Design
Group
National Center for
Biotechnology (Spanish Research
Council)
Campus Universidad Autonoma.
Cantoblanco. 28049
Madrid.
Tlf: +34-91-5854669. Fax: +34-91-5854506.