The Suiseki Information Extraction System
Suiseki is a system for the extraction of protein-protein interactions from large collections of scientific text. It combines the statistical analysis of protein interactions, the analysis of the syntactical structure of the phrase, and a frame-based module dedicated to the detection of protein and gene names. The core of the system is the set of frames that capture the different modes in which the relation between proteins and genes are expressed in standard text.
Besides the text which has to be analysed it needs not further interaction with the user. The system detects automatically the protein names in the text and analyses it further for possible statements about protein-protein interactions. Furthermore additional information on protein functions is extracted and provided to the user.
The status of the first implementation was published at the congress "Intelligent Systems in Molecular Biology" held 1999 in Heidelberg (see the abstract and publications). The system was developed further in the last years and publication concerning the current status is in preparation.
To view the results a Java based applet was implemented that represents the interactions in a graph. The level of confidence can be adjusted to allow to review high scoring interaction first and to explore the interaction network step by step. A description of the interface can be found here.
On-line examples:
Some of the analyses we used during the development of the system are provided on-line. You are kindly invited to review them and give us feedback that will allow us to improve the current implementation.
This page is maintained and updated by Christian Blaschke from the Protein Design Group at the CNB/Madrid
If you have comments, please send mail to blaschke@cnb.uam.es