Robert Hoffmann, PhD
This is the former webpage of Robert Hoffmann at the Protein Design Group in Madrid, Spain. Robert is continuing his research at the Memorial Sloan-Kettering Cancer Center in New York.
Selected works
A gene network for navigating the literature
A network of genes and proteins extends through the scientific literature, touching on phenotypes, pathologies and gene function. We report the development of an information system that provides this network as a natural way of accessing the more than ten million abstracts in PubMed. By using genes and proteins as hyperlinks between sentences and abstracts, we convert the information in PubMed into one navigable resource and bring all the advantages of the internet to scientific literature investigation.
Moreover, this literature network can be superimposed on experimental interaction data (e.g. yeast two-hybrid data from Drosophila melanogaster and Caenorhabditis elegans) to make possible a simultaneous analysis of new and existing knowledge. The network, called Information Hyperlinked over Proteins (iHOP), contains half a million sentences and 30,000 different genes from humans, mice, D. melanogaster, C. elegans, zebrafish, Arabidopsis thaliana, yeast and Escherichia coli.
The iHOP server is publicly accessible here.
Hoffmann, R., Valencia, A. A gene network for navigating the literature. Nature Genetics 36, 664 (2004). PubMed.
Text mining for metabolic pathways, signaling cascades, and protein networks
The complexity of the information stored in databases and publications on metabolic and signaling pathways, the high throughput of experimental data, and the growing number of publications make it imperative to provide systems to help the researcher navigate through these interrelated information resources. Text-mining methods have started to play a key role in the creation and maintenance of links between the information stored in biological databases and its original sources in the literature. These links will be extremely useful for database updating and curation, especially if a number of technical problems can be solved satisfactorily, including the identification of protein and gene names (entities in general) and the characterization of their types of interactions. The first generation of openly accessible text-mining systems, such as iHOP (Information Hyperlinked over Proteins), provides additional functions to facilitate the reconstruction of protein interaction networks, combine database and text information, and support the scientist in the formulation of novel hypotheses. The next challenge is the generation of comprehensive information regarding the general function of signaling pathways and protein interaction networks.
Hoffmann, R., Krallinger, M., Andres, E, Tamames, J., Blaschke, C., Valencia, A. Text mining for metabolic pathways, signaling cascades, and protein networks. Sci STKE 283, 21 (2005). PubMed.
Life cycles of successful genes
By exploring time-series data from MEDLINE abstracts, we observe that only a few genes have been quoted with increasing frequency during the past 25 years. This is probably the result of selective pressure by the scientific community. Over the years, this selection has produced an extreme power law distribution of the information available for individual genes. Interestingly, those genes that are successfully selected are not necessarily the most important genes to the cell. To stress the implication of this finding we show that there is no correlation between a gene's impact in the scientific literature and its centrality in protein-interaction networks.
Hoffmann, R., Valencia, A. Life cycles of successful genes. Trends Genet. 19(2), 79-81. (2003). PubMed.
Protein interaction: Same network, different hubs
Recently large scale experiments have provided new insights into the complex protein interaction network in yeast. However, previous analyses show that the number of interacting pairs that are common to different methods is extremely low and therefore less informative than expected. We show that comparing connectivities of individual proteins can reveal a common tendency between methods that has been missed by the pairwise comparison of interactions. We find significant correlations between experimental methods and also between various in silico methods. Exceptionally, a computational method, gene neighbourhood, correlates to both in silico and experimental approaches.
Hoffmann, R., Valencia, A. Protein interaction: Same network, different hubs. Trends Genet. 19(12), 681-683. (2003). PubMed.