The following information has been received by the server:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

________________________________________________________________________________

reference predict_h2098 (Fri May 24 11:41:41 MDT 1996)
from pazos@gredos.cnb.uam.es
password(###)
resp MAIL
orig HTML
prediction of: -secondary structure-solvent accessibility-
return no aligment
# msf format


Please check that the conversion from MSF to HSSP format of the align-
ment looks reasonable.						


The alignment that has been used as input to the network is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

________________________________________________________________________________

--- ------------------------------------------------------------
--- MAXHOM multiple sequence alignment
--- ------------------------------------------------------------
---
--- MAXHOM ALIGNMENT HEADER: ABBREVIATIONS FOR SUMMARY
--- ID           : identifier of aligned (homologous) protein
--- STRID        : PDB identifier (only for known structures)
--- PIDE         : percentage of pairwise sequence identity
--- WSIM         : percentage of weighted similarity
--- LALI         : number of residues aligned
--- NGAP         : number of insertions and deletions (indels)
--- LGAP         : number of residues in all indels
--- LSEQ2        : length of aligned sequence
--- ACCNUM       : SwissProt accession number
--- NAME         : one-line description of aligned protein
---
--- MAXHOM ALIGNMENT HEADER: SUMMARY
ID         STRID  IDE WSIM LALI NGAP LGAP LEN2 ACCNUM NAME
S38883             82    0   84    0    0   84
PNP_HAEIN          67    0   83    0    0   83
RS1H_BACSU         57    0   78    1    1   77
BS29668_2          52    0   83    0    0   83
CEEEED8_6          49    0   77    1    1   76
YABR_BACSU         48    0   84    0    0   84
HS055891_1         46    0   83    0    0   83
RS1_RHIME          43    0   83    0    0   83
SA19858_1          43    0   81    0    0   81
HSHRH1_1           43    0   80    1    1   79
RS1_MYCLE          43    0   74    0    0   74
1222505    1222    42    0   78    0    0   78
RR1_SPIOL          42    0   78    1    1   77
NMCPSPS_1          41    0   78    0    0   78
YHGF_ECOLI         40    0   84    0    0   84
PR22_YEAST         41    0   76    0    0   76
RS1_SYNP6          38    0   79    1    1   78
RS1_PROSP          38    0   83    1    1   82
1221346    1221    35    0   83    1    1   82
HSKIAA25_1         35    0   74    0    0   74
RPOE_SULAC         34    0   72    1    1   71
SC9959_11          31    0   83    0    0   83
RS1_CHLTR          32    0   81    1    5   76
---
--- MAXHOM ALIGNMENT: IN MSF FORMAT
MSF of: /home/phd/server/work/predict_h2098_1615.hssp from:    1 to:   84
 /home/phd/server/work/predict_h2098_1615.ret_msf  MSF:   84  Type: P 24-May-96  11:41:4  Check: 6000  ..


 Name: PNS1        Len:    84  Check: 4930  Weight:  1.00
 Name: S38883      Len:    84  Check: 4045  Weight:  1.00
 Name: PNP_HAEIN   Len:    84  Check: 4384  Weight:  1.00
 Name: RS1H_BACSU  Len:    84  Check: 9612  Weight:  1.00
 Name: BS29668_2   Len:    84  Check: 2115  Weight:  1.00
 Name: CEEEED8_6   Len:    84  Check: 2346  Weight:  1.00
 Name: YABR_BACSU  Len:    84  Check: 3816  Weight:  1.00
 Name: HS055891_1  Len:    84  Check: 4280  Weight:  1.00
 Name: RS1_RHIME   Len:    84  Check: 5063  Weight:  1.00
 Name: SA19858_1   Len:    84  Check: 5029  Weight:  1.00
 Name: HSHRH1_1    Len:    84  Check: 3390  Weight:  1.00
 Name: RS1_MYCLE   Len:    84  Check: 6902  Weight:  1.00
 Name: 1222505     Len:    84  Check: 1566  Weight:  1.00
 Name: RR1_SPIOL   Len:    84  Check:  373  Weight:  1.00
 Name: NMCPSPS_1   Len:    84  Check:  152  Weight:  1.00
 Name: YHGF_ECOLI  Len:    84  Check: 4396  Weight:  1.00
 Name: PR22_YEAST  Len:    84  Check: 2065  Weight:  1.00
 Name: RS1_SYNP6   Len:    84  Check: 9505  Weight:  1.00
 Name: RS1_PROSP   Len:    84  Check: 2941  Weight:  1.00
 Name: 1221346     Len:    84  Check: 3567  Weight:  1.00
 Name: HSKIAA25_1  Len:    84  Check: 8406  Weight:  1.00
 Name: RPOE_SULAC  Len:    84  Check: 9904  Weight:  1.00
 Name: SC9959_11   Len:    84  Check: 1446  Weight:  1.00
 Name: RS1_CHLTR   Len:    84  Check: 5767  Weight:  1.00

//


          1                                                   50
PNS1      AEIEVGRVYT GKVTRIVDFG AFVAIGGGKE GLVHISQIAD KRVEKVTDYL
S38883    AEIEVGRIYA GKVTRIVDFG AFVAIGGGKE GLVHISQIAD KRVEKVADYL
PNP_HAEIN AEVEAGVIYK GKVTRLADFG AFVAIVGNKE GLVHISQIAE ERVEKVSDYL
RS1H_BACSUQSLEVGSVLD GKVQRLTDFG AFVDIGG.ID GLVHISQLSH SHVEKPSDVV
BS29668_2 .EVEVGQLYL GKVKRIEKFG AFVEIFSGKD GLVHISELAL ERVGKVEDVV
CEEEED8_6 ...EIGKIYD GRVNSIQSFG AFITLEGFQE GLVHISQIRN ERVQTVADVL
YABR_BACSUMSIEVGSKLQ GKITGITNFG AFVELPGGST GLVHISEVAD NYVKDINDHL
HS055891_1DQIAAGSVLE GTVKRVKDFG AFVEILPGIE GLVHVSQISN KRIENPSEVL
RS1_RHIME AKYPVGKKIS GTVTNITDYG AFVELEPGIE GLIHISEMST KKNVHPGKIL
SA19858_1 ...EVGERIL GSVVKTTTFG AFVSLLPGKD GLLHISQIRK KRVENVEDVL
HSHRH1_1  EEPTIGDIYN GKVTSIMQFG CFVQLEGLRE GLVHISELRR ERVANVADVV
RS1_MYCLE .THAIGQIVP GKVTKLVPFG AFVRVEEGIE GLVHISELAE RHVEVPDQVV
1222505   TDLKSGMILE GTVTNVTNFG AFVDIGVHQD GLVHISSLSD KFVEDPHQVV
RR1_SPIOL AQLGIGSVVT GTVQSLKPYG AFIDIGG.IN GLLHVSQISH DRVSDIATVL
NMCPSPS_1 SDLQVGMILE GVVSNVANFG AFVDIGVHQD GLVHISALSN KFVQDPREVV
YHGF_ECOLINDLQPGMILE GAVTNVTNFG AFVDIGVHQD GLVHISSLSN KFVEDPHTVV
PR22_YEAST....LHKVYE GKVRNITTFG CFVQIFGTRD GLVHISEMSD QRTLDPHDVV
RS1_SYNP6 NRLEVGEVVV GAVRGIKPYG AFIDIGG.VS GLLHISEISH DHIETPHSVF
RS1_PROSP ENLQEGMEVK GIVKNLTDYG AFVDLGG.VD GLLHITDMAW KRVKHPSEIV
1221346   ENLVEGSEVK GVVKNLTEYG AFVDLGG.VD GLLHITDMAW KRVKHPSEIV
HSKIAA25_1SEIHPGMLLI GFVKSIKDYG VFIQFPSGLS GLAPKAIMSD KFVTSTSDHF
RPOE_SULAC....IHEVIE GEVSQVDNYG VYVNMGP.VD GLVHISQITD DNLEKSKKSI
SC9959_11 SDIKAGDVFE GTIKSVTDFG VFVKLDNTVT GLAHITEIAD KKPEDLSALF
RS1_CHLTR SEVQPGAILK GTVVDISKDF VVVDVGLKSE GVIPMSEFID S.....SEGL

          51                                 84
PNS1      QMGQEVPVKV LEVDRQGRIR LSIKEATEQS QPAA
S38883    QVGQETSVKV LEIDRQGRVR LSIKEATAGT AVEE
PNP_HAEIN QVGQEVNVKV VEIDRQGRIR LTMKDLAPKQ ETE.
RS1H_BACSUEEGQEVKVKV LSVDRDERIS LSIKDTLP.. ....
BS29668_2 KIGDEILVKV TEIDKQGRVN LSRKAVLREE KEKE
CEEEED8_6 KRGENVKVKV NKIEN.GKIS LSMKEVDQNS ....
YABR_BACSUKVGDQVEVKV INVEKDGKIG LSIKKAKDRP QARP
HS055891_1KSGDKVQVKV LDIKPEERIS LSMKALEEKP ERE.
RS1_RHIME STSQEVDVVV LEVDPTRRIS LGLKQTLENP WQA.
SA19858_1 GVGQKVQVEI AEIDSRGKLS LIPVIEGEEA ASDE
HSHRH1_1  SKGQRVKVKV LSFTG.TKTS LSMKDVDQET ....
RS1_MYCLE AVGDDAMVKV IDIDLERRIS LSLKA..... ....
1222505   KTGNIVKVKV LEVDVRKRIA LTMRLDES.. ....
RR1_SPIOL QPGDTLKVMI LSHDREGRVS LSTKKLEP.. ....
NMCPSPS_1 KAGDVVKVKV LEVDARKRIA LTMRLDDE.. ....
YHGF_ECOLIKAGDIVKVKV LEVDLQKRIA LTMRLDEQPG ETNA
PR22_YEASTRQGQHIFVEV IKIQNNGKIS LSMKNIDQHS ....
RS1_SYNP6 NVNDEVKVMI IDLDAEGRIS LSTKQLEPE. ....
RS1_PROSP NVGDEITVKV LKFDRETRVS LGLKQLGEDP WVA.
1221346   NVGDEVTVKV LKFDKDTRVS LGLKQLGQDP WAA.
HSKIAA25_1VEGQTVAAKV TNVDEEQRML LSLR...... ....
RPOE_SULACTKGDRVRAMI ISSGRLPRIA LTMKQP.... ....
SC9959_11 GVGDRVKAIV LKTNPEKQIS LSLKASHFSK EAE.
RS1_CHLTR SVGAEVEVYL DQEDEEGKVV LSREKATRQR Q...

________________________________________________________________________________


PredictProtein@EMBL-Heidelberg.DE
PHD: Profile fed neural network systems from HeiDelberg

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Prediction of secondary structure by PHDsec
Prediction of solvent accessibility by PHDacc		
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Author:             Burkhard Rost		
EMBL, Heidelberg, FRG
Meyerhofstrasse 1, 69 117 Heidelberg
Internet: Predict-Help@EMBL-Heidelberg.DE

All rights reserved.


Please quote								
~~~~~~~~~~~			

The PredictProtein mail server is described in:
B Rost:  PHD: predicting one-dimensional  protein structure by pro-
	file based neural networks. Meth. in Enzym., 1996, 266, 525-539.

Additionally to be quoted for publications of PHDsec output:
B Rost & C Sander: Prediction of protein structure at better than
	70% accuracy.  J. Mol. Biol., 1993, 232, 584-599.		

The latest improvement steps (up to 72%) are explained in:
B Rost & C Sander: Combining evolutionary information and neural	
	networks to predict protein secondary structure. Proteins, 1994,
	19, 55-72.							

Additionally to be quoted for publications of PHDacc output:
B Rost & C Sander: Conservation and prediction of solvent accessi-
	bility in protein families.  Proteins, 1994, 20, 216-226.	


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Prediction of secondary structure by PHDsec
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


About the input to the network
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The prediction is performed by a system of neural networks.
The input is a multiple sequence alignment. It is taken from an HSSP
file (produced by the program MaxHom:
Sander, Chris & Schneider, Reinhard: Database of Homology-Derived
Structures and the Structural Meaning of Sequence Alignment.
Proteins, 1991, 9, 56-68.

For optimal results the alignment should contain sequences with varying
degrees of sequence similarity relative to the input protein.
The following is an ideal situation:

+-----------------+----------------------+
|   sequence:     |  sequence identity   |
+-----------------+----------------------+
| target sequence |  100 %               |
| aligned seq. 1  |   90 %               |
| aligned seq. 2  |   80 %               |
|      ...        |   ...                |
| aligned seq. 7  |   30 %               |
+-----------------+----------------------+


Estimated Accuracy of Prediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A careful cross validation test on some 250 protein chains (in total
about 55,000 residues) with less than 25% pairwise sequence identity
gave the following results:

++================++-----------------------------------------+
|| Qtotal = 72.1% ||      ("overall three state accuracy")   |
++================++-----------------------------------------+

+----------------------------+-----------------------------+
| Qhelix (% of observed)=70% | Qhelix (% of predicted)=77% |
| Qstrand(% of observed)=62% | Qstrand(% of predicted)=64% |
| Qloop  (% of observed)=79% | Qloop  (% of predicted)=72% |
+----------------------------+-----------------------------+
..........................................................................

These percentages are defined by:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

|                    number of correctly predicted residues
|Qtotal =            ---------------------------------------      (*100)
|                          number of all residues
|
|                    no of res correctly predicted to be in helix
|Qhelix (% of obs) = -------------------------------------------- (*100)
|                    no of all res observed to be in helix
|
|
|                    no of res correctly predicted to be in helix
|Qhelix (% of pred)= -------------------------------------------- (*100)
|                    no of all residues predicted to be in helix

..........................................................................

Averaging over single chains
~~~~~~~~~~~~~~~~~~~~~~~~~~~

The most reasonable way to compute the overall accuracies is the above
quoted percentage of correctly predicted residues.  However, since the
user is mainly interested in the expected performance of the prediction
for a particular protein, the mean value when averaging over protein
chains might be of help as well.  Computing first the three state
accuracy for each protein chain, and then averaging over 250 chains
yields the following average:

+-------------------------------====--+
| Qtotal/averaged over chains = 72.2% |
+-------------------------------====--+
| standard deviation          =  9.3% |
+-------------------------------------+

..........................................................................

Further measures of performance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Matthews correlation coefficient:

+---------------------------------------------+
| Chelix = 0.63, Cstrand = 0.53, Cloop = 0.52 |
+---------------------------------------------+
..........................................................................

Average length of predicted secondary structure segments:

.           +------------+----------+
.           |  predicted | observed |
+-----------+------------+----------+
| Lhelix  = |    10.3    |    9.3   |
| Lstrand = |     5.0    |    5.3   |
| Lloop   = |     7.2    |    5.9   |
+-----------+------------+----------+
..........................................................................

The accuracy matrix in detail:

+---------------------------------------+
|    number of residues with H, E, L    |
+---------+------+------+------+--------+
|         |net H |net E |net L |sum obs |
+---------+------+------+------+--------+
| obs H   |12447 | 1255 | 3990 |  17692 |
| obs E   |  949 | 7493 | 3750 |  12192 |
| obs L   | 2604 | 2875 |19962 |  25441 |
+---------+------+------+------+--------+
| sum Net |16000 |11623 |27702 |  55325 |
+---------+------+------+------+--------+

Note: This table is to be read in the following manner:
12447 of all residues predicted to be in helix, were observed to
be in helix, 949 however belong to observed strands, 2604 to
observed loop regions.  The term "observed" refers to the DSSP
assignment of secondary structure calculated from 3D coordinates
of experimentally determined structures (Dictionary of Secondary
Structure  of Proteins: Kabsch & Sander (1983) Biopolymers, 22,
2577-2637).


Position-specific reliability index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The network predicts the three secondary structure types using real
numbers from the output units. The prediction is assigned by choosing
the maximal unit ("winner takes all").  However, the real numbers
contain additional information.
E.g. the difference between the maximal and the second largest output
unit can be used to derive a "reliability index".  This index is given
for each residue along with the prediction.  The index is scaled to
have values between 0 (lowest reliability), and 9 (highest).
The accuracies (Qtot) to be expected for residues with values above a
particular value of the index are given below as well as the fraction
of such residues (%res).:

+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| index|  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |
| %res |100.0| 99.2| 90.4| 80.9| 71.6| 62.5| 52.8| 42.3| 29.8| 14.1|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|      |     |     |     |     |     |     |     |     |     |     |
| Qtot | 72.1| 72.3| 74.8| 77.7| 80.3| 82.9| 85.7| 88.5| 91.1| 94.2|
|      |     |     |     |     |     |     |     |     |     |     |
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| H%obs| 70.4| 70.6| 73.7| 77.1| 80.1| 83.1| 86.0| 89.3| 92.5| 96.4|
| E%obs| 61.5| 61.7| 63.7| 66.6| 69.1| 71.7| 74.6| 77.0| 77.8| 68.1|
|      |     |     |     |     |     |     |     |     |     |     |
| H%prd| 77.8| 78.0| 80.0| 82.6| 84.7| 86.9| 89.2| 91.3| 93.1| 95.4|
| E%prd| 64.5| 64.7| 67.8| 71.0| 74.2| 77.6| 81.4| 85.1| 89.8| 93.5|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

The above table gives the cumulative results, e.g. 62.5% of all
residues have a reliability of at least 5.  The overall three-state
accuracy for this subset of almost two thirds of all residues is 82.9%.
For this subset, e.g., 83.1% of the observed helices are correctly
predicted, and 86.9% of all residues predicted to be in helix are
correct.

..........................................................................

The following table gives the non-cumulative quantities, i.e. the
values per reliability index range.  These numbers answer the question:
how reliable is the prediction for all residues labeled with the
particular index i.

+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| index|  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |
| %res |  8.8|  9.5|  9.3|  9.1|  9.7| 10.5| 12.5| 15.7| 14.1|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|      |     |     |     |     |     |     |     |     |     |
| Qtot | 46.6| 50.6| 57.7| 62.6| 67.9| 74.2| 82.2| 88.3| 94.2|
|      |     |     |     |     |     |     |     |     |     |
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| H%obs| 36.8| 42.3| 49.5| 55.2| 61.7| 69.9| 78.8| 87.4| 96.4|
| E%obs| 44.7| 44.5| 52.1| 55.4| 60.9| 68.0| 75.9| 81.0| 68.1|
|      |     |     |     |     |     |     |     |     |     |
| H%prd| 49.9| 52.5| 60.3| 64.2| 69.2| 77.5| 85.4| 89.9| 95.4|
| E%prd| 41.7| 47.1| 53.6| 57.0| 64.0| 71.6| 78.8| 88.8| 93.5|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+

For example, for residues with Relindex = 5 64% of all predicted betha-
strand residues are correctly identified.


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Prediction of solvent accessibility by PHDacc		
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Definition of accessibility
~~~~~~~~~~~~~~~~~~~~~~~~~~

For training the residue solvent accessibility the DSSP (Dictionary of
Secondary Structure of Proteins; Kabsch & Sander (1983) Biopolymers, 22,
2577-2637) values of accessible surface area have been used.  The
prediction provides values for the relative solvent accessibility.  The
normalisation is the following:

|                           ACCESSIBILITY (from DSSP in Angstrom)
|RELATIVE_ACCESSIBILITY =   ------------------------------------- * 100
|                               MAXIMAL_ACC (amino acid type i)

where MAXIMAL_ACC (i) is the maximal accessibility of amino acid type i.
The maximal values are:

+----+----+----+----+----+----+----+----+----+----+----+----+
|  A |  B |  C |  D |  E |  F |  G |  H |  I |  K |  L |  M |
| 106| 160| 135| 163| 194| 197|  84| 184| 169| 205| 164| 188|
+----+----+----+----+----+----+----+----+----+----+----+----+
|  N |  P |  Q |  R |  S |  T |  V |  W |  X |  Y |  Z |
| 157| 136| 198| 248| 130| 142| 142| 227| 180| 222| 196|
+----+----+----+----+----+----+----+----+----+----+----+

Notation: one letter code for amino acid, B stands for D or N; Z stands
for E or Q; and X stands for undetermined.

The relative solvent accessibility can be used to estimate the number
of water molecules (W) in contact with the residue:

W = ACCESSIBILITY /10

The prediction is given in 10 states for relative accessibility, with

RELATIVE_ACCESSIBILITY = (PREDICTED_ACC * PREDICTED_ACC)

where PREDICTED_ACC = 0 - 9.


Estimated Accuracy of Prediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A careful cross validation test on some 238 protein chains (in total
about 62,000 residues) with less than 25% pairwise sequence identity
gave the following results:


Correlation
...........

The correlation between observed and predicted solvent accessibility
is:

-----------
corr = 0.53
-----------

This value ought to be compared to the worst and best case prediction
scenario: random prediction (corr = 0.0) and homology modelling
(corr = 0.66).  (Note: homology modelling yields a relative accurate
prediction in 3D if, and only if, a significantly identical sequence
has a known 3D structure.)


3-state accuracy
................

Often the relative accessibility is projected onto, e.g., 3 states:
b  = buried       (here defined as < 9% relative accessibility),
i  = intermediate ( 9% <= rel. acc. < 36% ),
e  = exposed      ( rel. acc. >= 36% ).

A projection onto 3 states or 2 states (buried/exposed) enables the
compilation of a 3- and 2-state prediction accuracy.  PHD reaches an
overall 3-state accuracy of:
Q3 = 57.5%
(compared to 35% for random prediction and 70% for homology modelling).

In detail:

+-----------------------------------+-------------------------+
| Qburied       (% of observed)=77% | Qb (% of predicted)=60% |
| Qintermediate (% of observed)= 9% | Qi (% of predicted)=44% |
| Qexposed      (% of observed)=78% | Qe (% of predicted)=56% |
+-----------------------------------+-------------------------+


10-state accuracy
.................

The network predicts relative solvent accessibility in 10 states, with
state i (i = 0-9) corresponding to a relative solvent accessibility of
i*i %.  The 10-state accuracy of the network is:

Q10 = 24.5%

..........................................................................

These percentages are defined by:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

|                     number of correctly predicted residues
|Q3 		      = ---------------------------------------      (*100)
|                           number of all residues
|
|                     no of res. correctly predicted to be buried
|Qburied (% of obs) = ------------------------------------------- (*100)
|                     no of all res. observed to be buried
|
|
|                     no of res. correctly predicted to be buried
|Qburied (% of pred)= ------------------------------------------- (*100)
|                     no of all residues predicted to be buried

..........................................................................

Averaging over single chains
~~~~~~~~~~~~~~~~~~~~~~~~~~~

The most reasonable way to compute the overall accuracies is the above
quoted percentage of correctly predicted residues.  However, since the
user is mainly interested in the expected performance of the prediction
for a particular protein, the mean value when averaging over protein
chains might be of help as well.  Computing first the correlation
between observed and predicted accessibility for each protein chan, and
then averaging over all 238 chains yields the following average:

+-------------------------------====--+
| corr/averaged over chains   = 0.53  |
+-------------------------------====--+
| standard deviation          = 0.11  |
+-------------------------------------+

..........................................................................

Further details of performance accuracy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The accuracy matrix in detail:
..............................

-------+----------------------------------------------------+-----------
\ PHD |    0    1   2   3    4    5     6     7    8    9  |  SUM  %obs
-------+----------------------------------------------------+-----------
OBS  0 | 8611  140   8  44   82  169   772   334   27    0  | 10187 16.6
OBS  1 | 4367  164   0  50  106  231   738   346   44    3  |  6049  9.8
OBS  2 | 3194  168   1  68  125  303   951   513   42    7  |  5372  8.7
OBS  3 | 2760  159   8  80  136  327  1246   746   58   19  |  5539  9.0
OBS  4 | 2312  144   2  72  166  396  1615  1245  124   19  |  6095  9.9
OBS  5 | 1873   96   3  84  138  425  1979  1834  187   27  |  6646 10.8
OBS  6 | 1387   67   1  60   80  278  2237  2627  231   51  |  7019 11.4
OBS  7 | 1082   35   0  32   56  225  1871  3107  302   60  |  6770 11.0
OBS  8 |  660   25   0  27   43  136  1206  2374  325   87  |  4883  7.9
OBS  9 |  325   20   2  27   29   74   648  1159  366  214  |  2864  4.7
-------+----------------------------------------------------+-----------
SUM    |26571 1018  25 544  961 2564 13263 14285 1706  487  |
%pred  | 43.3  1.7 0.0 0.9  1.6  4.2  21.6  23.3  2.8  0.8  |
-------+----------------------------------------------------+-----------

Note: This table is to be read in the following manner:
8611 of all residues predicted to be in exposed by 0%, were
observed with 0% relative accessibility.  However, 325 of all
residues predicted to have 0% are observed as completely exposed
(obs = 9 -> rel. acc. >= 81%).  The term "observed" refers to the
DSSP compilation of area of solvent accessibility calculated from
3D coordinates of experimentally determined structures (Diction-
ary of Secondary Structure  of Proteins: Kabsch & Sander (1983)
Biopolymers, 22, 2577-2637).


Accuracy for each amino acid:
.............................

+---+------------------------------+-----+-------+------+
|AA |   Q3 b%o b%p i%o i%p e%o e%p | Q10 |  corr |    N |
+---+------------------------------+-----+-------+------+
| A | 59.0  87  60   2  38  66  57 |  31 | 0.530 | 5054 |
| C | 62.0  91  67   5  39  25  21 |  34 | 0.244 |  893 |
| D | 56.5  21  45   6  49  94  57 |  20 | 0.321 | 3536 |
| E | 60.8   9  40   3  41  98  61 |  21 | 0.347 | 3743 |
| F | 63.3  94  67   9  46  29  37 |  27 | 0.366 | 2436 |
| G | 52.1  75  51   1  31  67  53 |  22 | 0.405 | 4787 |
| H | 50.9  63  53  23  45  71  50 |  18 | 0.442 | 1366 |
| I | 64.9  95  68   6  41  30  38 |  34 | 0.360 | 3437 |
| K | 66.6   2  11   2  37  98  67 |  23 | 0.267 | 3652 |
| L | 61.6  93  65   8  44  31  40 |  31 | 0.368 | 5016 |
| M | 60.1  92  64   5  39  45  44 |  29 | 0.452 | 1371 |
| N | 55.5  45  45   8  38  87  59 |  17 | 0.410 | 2923 |
| P | 53.0  48  48   9  39  83  56 |  18 | 0.364 | 2920 |
| Q | 54.3  27  44   7  44  92  56 |  20 | 0.344 | 2225 |
| R | 49.9  15  47  36  47  76  51 |  18 | 0.372 | 2765 |
| S | 55.6  69  53   3  51  81  56 |  22 | 0.464 | 3981 |
| T | 51.8  61  51   8  38  78  53 |  21 | 0.432 | 3740 |
| V | 61.1  93  65   5  40  39  42 |  34 | 0.418 | 4156 |
| W | 56.2  85  62  20  49  29  27 |  21 | 0.318 |  891 |
| Y | 49.7  73  52  33  49  36  38 |  19 | 0.359 | 2301 |
+---+------------------------------+-----+-------+------+

Abbreviations:

AA:   amino acid in one-letter code
b%o, i%o, e%o:   = Qburied, Qintermediate, Qexposed (% of observed),
i.e. percentage of correct prediction in each state, see above
b%p, i%p, e%p:   = Qburied, Qintermediate, Qexposed (% of predicted),
i.e. probability of correct prediction in each state, see above
b%o:  = Qburied (% of observed), see above
Q10:  percentage of correctly predicted residues in each of the 10
states of predicted relative accessibility.
corr: correlation between predicted and observed rel. acc.
N:    number of residues in data set


Accuracy for different secondary structure:
...........................................

+--------+------------------------------+----+-------+-------+
| type   |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |     N |
+--------+------------------------------+----+-------+-------+
| helix  | 59.5  79  64   8  44  80  56 | 27 | 0.574 | 20100 |
| strand | 61.3  84  73   9  46  69  37 | 35 | 0.524 | 13356 |
| loop   | 54.4  64  43  11  44  78  61 | 18 | 0.442 | 27968 |
+--------+------------------------------+----+-------+-------+

Abbreviations as before.


Position-specific reliability index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The network predicts the 10 states for relative accessibility using real
numbers from the output units. The prediction is assigned by choosing
the maximal unit ("winner takes all").  However, the real numbers
contain additional information.
E.g. the difference between the maximal and the second largest output
unit (with the constraint that the second largest output is compiled
among all units at least 2 positions off the maximal unit) can be used
to derive a "reliability index".  This index is given for each residue
along with the prediction.  The index is scaled to have values between
0 (lowest reliability), and 9 (highest).
The accuracies (Q3, corr, asf.) to be expected for residues with values
above a particular value of the index are given below as well as the
fraction of such residues (%res).:

+---+------------------------------+----+-------+-------+
|RI |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |  %res |
+---+------------------------------+----+-------+-------+
| 0 | 57.5  77  60   9  44  78  56 | 24 | 0.535 | 100.0 |
| 1 | 59.1  76  63   9  45  82  57 | 25 | 0.560 |  91.2 |
| 2 | 61.7  79  66   4  47  87  58 | 27 | 0.594 |  77.1 |
| 3 | 66.6  87  70   1  51  89  63 | 30 | 0.650 |  57.1 |
| 4 | 70.0  89  72   0  83  91  67 | 32 | 0.686 |  45.8 |
| 5 | 72.9  92  75   0   0  93  70 | 34 | 0.722 |  35.6 |
| 6 | 76.3  95  77   0   0  93  75 | 36 | 0.769 |  24.7 |
| 7 | 79.0  97  79   0   0  93  78 | 39 | 0.803 |  16.0 |
| 8 | 80.9  98  80   0   0  91  81 | 43 | 0.824 |   9.6 |
| 9 | 81.2  99  80   0   0  88  83 | 45 | 0.828 |   5.9 |
+---+------------------------------+----+-------+-------+

Abbreviations as before.

The above table gives the cumulative results, e.g. 45.8% of all
residues have a reliability of at least 4.  The correlation for this
most reliably predicted half of the residues is 0.686, i.e. a value
comparable to what could be expected if homology modelling were
possible.  For this subset of 45.8% of all residues, 89% of the buried
residues are correctly predicted, and 72% of all residues predicted to
be buried are correct.

..........................................................................

The following table gives the non-cumulative quantities, i.e. the
values per reliability index range.  These numbers answer the question:
how reliable is the prediction for all residues labeled with the
particular index i.

+---+------------------------------+----+-------+-------+
|RI |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |  %res |
+---+------------------------------+----+-------+-------+
| 0 | 40.9  79  40  16  41  21  40 | 14 | 0.175 |   8.8 |
| 1 | 45.4  61  46  28  44  48  44 | 17 | 0.278 |  14.1 |
| 2 | 47.4  53  52  10  46  80  44 | 19 | 0.343 |  19.9 |
| 3 | 52.9  75  59   4  50  77  47 | 23 | 0.439 |  11.4 |
| 4 | 60.0  81  63   0  83  84  56 | 25 | 0.547 |  10.1 |
| 5 | 65.2  82  70   0   0  93  62 | 28 | 0.607 |  10.9 |
| 6 | 71.3  90  72   0   0  94  70 | 31 | 0.692 |   8.8 |
| 7 | 76.0  94  76   0   0  95  75 | 34 | 0.762 |   6.3 |
| 8 | 80.5  97  81   0   0  94  79 | 39 | 0.808 |   3.8 |
| 9 | 81.2  99  80   0   0  88  83 | 45 | 0.828 |   5.9 |
+---+------------------------------+----+-------+-------+

For example, for residues with RI = 4 83% of all predicted intermediate
residues are correctly predicted as such.


The resulting network (PHD) prediction is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

________________________________________________________________________________


PredictProtein@EMBL-Heidelberg.DE
PHD: Profile fed neural network systems from HeiDelberg
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Prediction of:			
	- secondary structure,   		by PHDsec		
	- solvent accessibility, 		by PHDacc		
	- and helical transmembrane regions, 	by PHDhtm		

Author:             Burkhard Rost		
EMBL, Heidelberg, FRG
Meyerhofstrasse 1, 69 117 Heidelberg
Internet: Predict-Help@EMBL-Heidelberg.DE
All rights reserved.


The network systems are described in:   		
	PHDsec: B Rost & C Sander, JMB, 1993, 232, 584-599.		
	      	B Rost & C Sander, Proteins, 1994, 19, 55-72.		
	PHDacc: B Rost & C Sander, Proteins, 1994, 20, 216-226.		
	PHDhtm: B Rost et al., Prot. Science,  4, 521-533.		


Some statistics
~~~~~~~~~~~~~~

Percentage of amino acids:
+--------------+--------+--------+--------+--------+--------+
| AA:          |    V   |    G   |    E   |    I   |    A   |
| % of AA:     |   14.3 |   10.7 |    9.5 |    8.3 |    8.3 |
+--------------+--------+--------+--------+--------+--------+
| AA:          |    R   |    Q   |    K   |    T   |    L   |
| % of AA:     |    7.1 |    7.1 |    7.1 |    4.8 |    4.8 |
+--------------+--------+--------+--------+--------+--------+
| AA:          |    D   |    S   |    Y   |    P   |    F   |
| % of AA:     |    4.8 |    3.6 |    2.4 |    2.4 |    2.4 |
+--------------+--------+--------+--------+--------+--------+
| AA:          |    M   |    H   |
| % of AA:     |    1.2 |    1.2 |
+--------------+--------+--------+

Percentage of secondary structure predicted:
+--------------+--------+--------+--------+
| SecStr:      |    H   |    E   |    L   |
| % Predicted: |    0.0 |   59.5 |   40.5 |
+--------------+--------+--------+--------+

According to the following classes:
all-alpha:   %H>45 and %E< 5; all-beta : %H<5 and %E>45
alpha-beta : %H>30 and %E>20; mixed:    rest,
this means that the predicted class is:           all-beta


PHD output for your protein
~~~~~~~~~~~~~~~~~~~~~~~~~~

Fri May 24 11:44:47 1996
Jury on:       10    different architectures (version   5.94_317 ).
Note: differently trained architectures, i.e., different versions can
result in different predictions.


About the protein
~~~~~~~~~~~~~~~~

HEADER
COMPND
SOURCE
AUTHOR
SEQLENGTH    84
NCHAIN        1 chain(s) in PNS1 data set
NALIGN       23
(=number of aligned sequences in HSSP file)


Abbreviations: PHDsec
~~~~~~~~~~~~~~~~~~~~

sequence:
AA : amino acid sequence
secondary structure:
HEL: H=helix, E=extended (sheet), blank=other (loop)
PHD: Profile network prediction HeiDelberg
Rel: Reliability index of prediction (0-9)
detail:
prH: 'probability' for assigning helix
prE: 'probability' for assigning strand
prL: 'probability' for assigning loop
note: the 'probabilites' are scaled to the interval 0-9, e.g.,
prH=5 means, that the first output node is 0.5-0.6
subset:
SUB: a subset of the prediction, for all residues with an expected
average accuracy > 82% (tables in header)
note: for this subset the following symbols are used:
L: is loop (for which above " " is used)
".": means that no prediction is made for this residue, as the
reliability is:  Rel < 5

Abbreviations: PHDacc
~~~~~~~~~~~~~~~~~~~~

solvent accessibility:
3st: relative solvent accessibility (acc) in 3 states:
b = 0-9%, i = 9-36%, e = 36-100%.
PHD: Profile network prediction HeiDelberg
Rel: Reliability index of prediction (0-9)
P_3: predicted relative accessibility in 3 states
note: for convenience a blank is used intermediate (i).
10st:relative accessibility in 10 states:
= n corresponds to a relative acc. of n*n %
subset:
SUB: a subset of the prediction, for all residues with an expected
average correlation > 0.69 (tables in header)
note: for this subset the following symbols are used:
"I": is intermediate (for which above " " is used)
".": means that no prediction is made for this residue, as the
reliability is: Rel < 4


protein:       PNS1           length       84

                  ....,....1....,....2....,....3....,....4....,....5....,....6
         AA      |AEIEVGRVYTGKVTRIVDFGAFVAIGGGKEGLVHISQIADKRVEKVTDYLQMGQEVPVKV|
         PHD sec |  EEEEEEEEEEEEEEE  EEEEEEE   EEEEEEEEE          EEE   EEEEEE|
         Rel sec |963754676667686531317999714640589998522344346320143384599999|
 detail:
         prH sec |000000000000000000000000000000000000001222101233311000000000|
         prE sec |026776787777787654357998843134689998653211321112455312789999|
         prL sec |973122212221111235541000146764200000235565566553222586200000|
 subset: SUB sec |LL.EE.EEEEEEEEEE....EEEEE..L..EEEEEEE.......L.......L.EEEEEE|

 ACCESSIBILITY
 3st:    P_3 acc |eebebbbbbebebebbbebbbbbbbbebbebbbbbbebbeeebeebeebbebeeebebeb|
 10st:   PHD acc |980700000706060006000000007017000000600776077077007077706060|
         Rel acc |841442013411420601226670503005454284132451243224443023452727|
 subset: SUB acc |ee.eb....e..b..b....bbb.b....ebbb.bb...ee..e...ebb....eb.b.b|
                  ....,....7....,....8....,....9....,....10...,....11...,....12
         AA      |LEVDRQGRIRLSIKEATEQSQPAA|
         PHD sec |EEE    EEEEEEE          |
         Rel sec |873477269998633578757889|
 detail:
         prH sec |000000000000000000111000|
         prE sec |876211478998753211110000|
         prL sec |013688510001135688767889|
 subset: SUB sec |EE..LL.EEEEEE..LLLLLLLLL|

 ACCESSIBILITY
 3st:    P_3 acc |bebeeeeebbbbbeebeeeeeeee|
 10st:   PHD acc |060677660000076077777799|
         Rel acc |412135024042351145444349|
 subset: SUB acc |b....e..b.b..e..eeeee.ee|
________________________________________________________________________________

-----------------------------------------------------------------------------
---   PredictProtein: NEWS from March, 1996				  ---
---   									  ---
---   PredictProtein is available interactively via WWW:		  ---
---    http://www.embl-heidelberg.de/predictprotein/predictprotein.html   ---
---   									  ---
---   The error rate in falsely predicting transmembrane helices for	  ---
---   globular proteins has been reduced by a new program to below 2%.	  ---
---   									  ---
---   The following option is now available upon request:                 ---
---   									  ---
---   1.   	"predict htm topology"					  ---
---   	Usage:  add the words "predict htm topology"  in any line before  ---
---             the one beginning with a hash (#), i.e. the line with     ---
---		the sequence name.	                                  ---
---    	Result:	a refined prediction of transmembrane helices and top-    ---
---		ology (PHDtopology) is returned.			  ---
---   									  ---
-----------------------------------------------------------------------------