ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
Hosted by NCSC USMirror sites:Canada China Korea Switzerland Taiwan
Search for

HPI  Human Proteomics Initiative

Version of January 2003

In the year 2000, the Swiss Institute of Bioinformatics (SIB) and the European Bioinformatics Institute (EBI) announced a major effort to annotate, describe and distribute to the life science community a large amount of highly curated information concerning human protein sequences. This initiative, hereafter known as the Human Proteomics Initiative (HPI), is tightly linked to an appeal to the user community to participate actively in this effort at various levels.

In 1857, a group of English lexicographers and philologists met and decided to undertake a major effort of collecting information concerning the meaning and usage of all words in the English language. This major collective effort spanned a number of decades and resulted in one of the most impressive monuments of knowledge on any given language; the Oxford English Dictionary (OED). To create and maintain the dictionary they locally built up a team of highly qualified linguistic experts and complemented this classical approach by what was at that time an innovative concept. They made an appeal to English speakers around the world to send them citations illustrating the use of particular words and how they evolved over time. Today we could use their original appeal as well as the description of their goal almost verbatim, only replacing the "English language" by the "human proteome"!

In 2001, the combined efforts of a number of sequencing centers and companies have produced a first draft of the human genome sequence. Such an endeavor was only a very preliminary step in the understanding of human biological processes. The first pitfall to overcome is the detection of all coding regions on the genomic sequence. Current algorithms, while being very powerful, are not capable of detecting with certainty all exons, are not well equipped to distinguish different splice variants and are unable to detect small proteins (which are numerous and crucial to many biological processes).

Even when all potential coding regions have been predicted, the user community will have at its disposition the sequence of 20'000 to 35'000 "naked" proteins (the precise number of human genes is a hotly debated subject of contention!). We call these proteins "naked" because genomic information does not allow the efficient prediction of all the post-translational modifications (PTM) of which the majority of proteins are the target. Proteins, once synthesized on the ribosomes, are subject to a multitude of modification steps. They are cleaved (thus eliminating signal sequences, transit or pro- peptides and initiator methionines); many simple chemical groups can be attached to them (example: acetyl, methyl, phosphoryl, etc.) as well as some more complex molecules, such as sugars and lipids. Finally, they can be internally or externally cross- linked (example: disulfide bonds). More than a hundred different types of PTM are currently known and many more are yet to be discovered. The complexity due to all these modifications is compounded by the high level of diversity that alternative splicing can produce at the level of sequence. Thus the number of different protein molecules expressed by the human genome is probably closer to a million than to the hundred thousand generally considered by genome scientists.

Another factor of complexity to take into account is the amount of polymorphism at the protein sequence level. While some of these polymorphisms are linked to disease states, most are not, yet have in many cases a direct or indirect effect on the activities of the proteins.

We therefore initiated a major project to annotate all known human sequences according to the quality standards of Swiss-Prot. This means providing, for each known protein, a wealth of information that includes the description of its function, domain structure, subcellular location, post-translational modifications, variants, similarities to other proteins, etc.

There are currently 9000 annotated human sequences in Swiss-Prot. These entries are associated with about 23'200 literature references; 22'600 experimental or predicted PTM's, 2'800 splice variants and 15'100 polymorphisms (most of which are linked with disease states).

The HPI project contains a number of sub-components, which are briefly described below:

For all aspects of the HPI projects, we would appreciate the help and collaboration of the scientific community. Information concerning the human proteome is highly critical to a large section of the life science community. We therefore appeal to the user community to fully participate in this initiative by providing all the necessary information to help and to speed up the comprehensive annotation of the human proteome.

The HPI project is a long-term challenge, it will take years to annotate and periodically re-annotate all human proteins in such a way as to obtain a full and useful compendium describing the function and more specifically the role of these crucial actors which are involved in most, if not all, biological processes.

It should also be noted that the goals of the HPI project will not be achieved by the Swiss-Prot groups at SIB and EBI without the financial means now being provided by the yearly license fees paid by industrial companies for access to Swiss-Prot and related databases.

In ancient times, the Chinese are said to have used the sentence "May you live in interesting time!" as a form of curse. There is no doubt that the life science community is living in interesting times; but we need to make sure that this is not a curse but a benediction.

For more information on the HPI project you can consult the following Web pages:

http://www.expasy.org/sprot/hpi/
http://www.ebi.ac.uk/swissprot/hpi/hpi.html

You can also download various non-redundant sets of human protein sequence entries from Swiss-Prot and TrEMBL from the following Web page:

http://www.ebi.ac.uk/proteome/HUMAN/

Human protein sequences from Swiss-Prot are integrated in the International Protein Index (IPI) available at:

http://www.ebi.ac.uk/IPI/IPIhelp.html

A short description of HPI has been published in:

O'Donovan C., Apweiler R., Bairoch A. The human proteomics initiative (HPI). Trends Biotechnol. 19:178-181(2001).

If you have any question or if you want to provide any relevant information, please send us email at: hpi@isb-sib.ch

ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
Hosted by NCSC USMirror sites:Canada China Korea Switzerland Taiwan