About SH3

"SH3 Genomics"



Objectives and expected achievements

Src homology 3 (SH3) domains belong to the most wide spread modular domains in eukaryotic organisms. They function as molecular adhesives, and they are found in a wide variety of signalling proteins, including protein and lipid kinases, protein phosphatases, phospholipases, Ras controlling proteins and adaptor proteins [2,3,11]. Though their binding to specific proline-rich sequence motifs, SH3 domains are known to play a crucial role in the formation of multiprotein complexes and networks responsible for signal transduction, cytoskeletal organisation and other cellular processes. The importance of SH3 domains for human health is underscored by the observation that mutations in many of these domains can cause severe malfunctions leading, for instance, to inflammatory diseases and cancer. Therefore, SH3 domains have become targets for pharmacological interventions.

The basic molecular rules for SH3 domain-ligand recognition have been determined. SH3 domains bind proline-rich ligands in a highly conserved aromatic groove running between two variable loops, known as RT and n-Src loops (for reviews see references 3 and 10). This groove forms the binding platform for ligand association while the two variable loops are key determinants in ligand specificity and orientation [7,8,14]. However, a deeper understanding of the rules that govern multiple SH3-mediated interactions within cellular processes will require and integrated approach involving high-resolution structural analysis, protein-protein interaction analysis and functional analysis on a genome-wide scale. The Saccharomyces cerevisiae (budding yeast) genome contains potentially 25 SH3 domain-containing proteins. Some of these proteins comprise more tahn a single SH3 module bringing the total number of different yeast SH3 domains to 29. Given the relatively low number of different SH3 domains, the availability of the complete genome sequence and the possibility to carry out sophisticated genetics, yeast proves an ideal model for an unified, proteome-wide approach to study the rules that govern SH3-mediated protein interactions and networks in eukaryotic organisms. Knowledge of tehse rules should help predicting these interactions in the human proteome and should help to gain knowledge on the molecular basis of diseases that originate from mutations of these domains.

To achieve this aim the proposal has six major objectives:

To create an integrated knowledge database containing the data on the yeast SH3 domain-containing proteins and their ligands obtained in the proposed network.


Contribution to programme/specific action objectives

A wealth of sequence information has recently become available through world-wide genome sequencing projects. To make these data useful for biotechnology and rationalised approaches for curing diseases, they need to be translated into an understanding of the temporal and spatial distribution of their resulting gene products (proteins) in the cell, and their physiological functions by investigating their interactions and their networking with other cellular components. Our proposed research addresses this issue for the one of the most widely found protein interaction modules in the cell: the SH3-domain. The proposal aims to understandthe structure and function underlying SH3-mediated protein interactions in the cell. The analysis will be carried out on a genome-wide scale in the yeast Saccharomyces cerevisiae. Thus, the project addresses priority 2 in generic activity 8, "Functional genomics and proteomics", in particular subpriority iii, "definition of protein families and interacting pathways". In addition, the project also contains elements of priority 4, "Development and application of underpinning biochemistry, biophysical, statistical and computational approaches" and priority 1, "Genome analysis".

The proposed research should produce a unified knowledge data base revealing a cellular, functional and structural understanding of the rules that govern SH3-mediated interactions and networking in yeast. The aim of our proposal is to provide a complete cellular topography for this modlular domain. We have chosen yeast as a model organism because it provides excellent genetic tools for our proposed studies. Furthermore, its cellular complexity is suffiently reduced for such a study, wich currently would not be possible within a network of the proposed size fro the mammalian genome. Moreover the genome of the distantly related species Schizosaccharomyces pombe (fission yeast) has recently been completed. This allows a comparative analysis between the SH3 domains found in both species.

Our proposal has a strong biotech development component, to stablish software for predicting SH3-mediated interactions, which will be beneficial for the future functional analysis of other genomes. Since genetic mutations in SH3-contaning proteins can cause a number of diseases such as allergy and asthma, inflammatory disease, osteoporosis, AIDS and cancer, our research proposal will also have impact on activity 7 "Chronic and degenerative diseases, cancer, diabetes, cardiovascular disease and rare diseases. The knowledge gained in this proposal could contribute to the development of therapies in these human pathologies.


State of the art and innovation aspects

SH3 domains are noncatalytic protein modules that mediate protein-protein interactions. These domains are normally part of multi-domain proteins without a fixed topological position. They are found in a wide variety of functionally different proteins, many of which are involved in signal transduction [10,11]. The first insight towards understanding binding of SH3 domain ligands through a proline-rich sequence motif originated from the screening of a lambda-cDNA expression library using the Abl SH3 domain [1,12]. A subsequent approach involved the application of proline-biased combinatorial peptide libraries displayed on beads or by phage display (reviewed in [3]). These studies have reveaded that isolated SH3 domains bind contiguous proline-rich ligands containing the "core" PxxP, where x denotes any amino acid. The binding of these core proline-rich peptide ligands has been characterised by relatively low binding affinities (Kd=5-100 uM), together with little selectivity within a family of homologous SH3 domains [4,13,15].

A number of SH3 domain structures and in the absence or presence of ligands have now been resolved by X-ray crystallography and NMR spectroscopy [6,9,15,16]. Initial studies demonstrated taht SH3 domains bind core proline-rich peptide ligands in a polyproline II helical conformation in a highly conserved aromatic rich patch on the protein surface. This sequence motif is flanked by polar residues that bind to less conserved portions of these SH3 domains. In particular, two highly variable loop regions in the SH3 domain, the RT and n-Src loops, play a key role in specificity and affinity. Furthermore, the residues in the flanking peptide regions and the specific nature of teh two variable loops in the SH3 domains determine the orientations of the peptide ligands (+ or -), which is a unique feature in this type of protein-protein interactions. In contrast to peptide ligands, protein ligands of SH3 domains can exploit multiple discontiguous interactions to enhance affinity and selectivity. However, in organisms like mammalians that contain more than one hundred different SH3 domains, the complexity of temporal, spatial and cellular distribution currently makes it a daunting task to identify and characterise the physiological binding partners. Therefore, it is not surprising that there are only few cases where the intact protein ligands have been identified and used to study their interaction with the cognate SH3 domain [7,8]. Since many SH3 domains occur in critical intracellular signalling proteins, such as in the oncogene Src and the adaptor proteins Grb2, Nck and Crk, it is essential to understand the rules that underlie SH3-mediated complex formation in the cell.

The availability of genome-wide information opens new avenues for exploration. For this proposal we have selected the genome of Saccaromyces cerevisiae because it appears to be optimally suited for teh genetic, functional and structural analysis we are aiming for. Furthermore, yeast is the only model organism where a comprehensive genomic and proteomic approach is conceivable. Because many genes are conserved from yeast to humans, the general biological importance of systematic genomics is self-evident. We expect, therefore, that our results will be beneficial for future analogous genomics and proteomics based analysis of SH3 domains or other related domains in higher eukaryotes, including mammalians. We anticipate that our data will be useful for mapping mutations, leading to malfunctions and diseases, and to provide insight into their possible physiological implications, using the yeast data as model.

The proposal includes the development and application of a number of innovative biotechnologies, supporting the overall aim. One major objective of teh participating SME is the development and application of high throughput strategies for expression and purification of recombinant proteins. Therefore, the participation in the proposed project will be beneficial for the SME within its current and planned contributions to large-scale structural genomics projects. The strategies, specifically developed within this network, will be made publicly available, and, therefore, will be beneficial for structural genomics projects of teh International scientific community as well. The second innovation is the development of algorithm for predicting SH3 mediated interactions in any proteome. There are several programmes in the literatures that allows the modelling of a ligand once the 3D structure of the target is known. However, a purpose-tailored algorithm that will focus on one particular type of domains is not available. The software that will be developed in the project will have the advantage that it focuses on a particular type of domains, thereby reducing the number of parameters and constraints, resulting in a significantly higher rate of success of the predictions. The combination of such algorithms with the localisation of SH3 domain-containing proteins in living cells should allow accurate prediction of SH3 domain targets.


Project workplan

Introduction. The overall aim of the project is to understand the cellular, molecular and structural rules that govern SH3-mediated protein interactions. To achieve this goal eight workpackages have been compiled, the results of which will be deposited in a publicly available database. The aim of this integrated database will be to promote wide Research&Development activities by academic institutions and industrial enterprises. The database content will be actively disseminated by a scientific meeting (in the third year of the contract), by publications in peer-reviewed journals and by display of its data over the internet. The analysis of SH3-mediated interactions will be carried out on a genome-wide scale in the yeast Saccaromyces cerevisiae. The yeast genome contains 25 genes with products predicted to contain SH3 domains. Some of these proteins comprise more than a single SH3 module bringing the total number of different yeast SH3 domains to 29. For only one third of the SH3 domain-containing proteins the (putative) binding partner has been identified indicating that only a fraction of the relevant interactions for this yeast protein family are presently known. Furthermore, we expect that many of these SH3 domains may interact with more than one binding partner depending on their temporal and spatial distribution during the yeast life cycle.

Some of the yeast SH3-containing proteins are highly conserved and their putative human counterparts have been identified in the databases, again emphasising that establishing SH3-mediated cellular networks in yeat will expand our understanding of analogous networks in humans. In order to achieve a complete representation of this protein family, SH3 domains from related organisms will be included in these studies as well. In particular, the fission yeast Schizosaccharomyces pombe has proven an invaluable complementary organism to budding yeast for cell and molecular genetic analysis. The fission yeast genome is nearly completed, and represents an enormous opportunity for studying the evolution of a domain in terms of structure and function between distantly related yeast strains. In the following overall workplan will be described, by clustering the workpackages leading to synergistic activities of the participating partners, on the project plan.

Workplan structure and methodology. The workplan is clustered around three types of activities coined "Structure" (WP1 and WP2), "Function" (WP3, WP4 and WP5) and "Biocomputing" (WP6, WP7 and WP8). Each workpackage has one partner co-ordinator. The objective of the "Structure" cluster is to describe the diversity of the SH3 domain family ghrough structural analysis. The "Function" cluster will focus on the identification and characterization of SH3 domain ligands (both peptide and natural ligands) and will describe SH3 domain function by whole genome expression profiling and protein localisation studies. The third cluster, "Biocomputing", has a pivotal role in the overall workplan. It will receive data from teh publicly available genome databases (in particular that of S. cerevisiae and S. pombe), which will be used as input for the "Structure" and "Function" clusters. The second task of the "Biocomputing" cluster is to deposit the output generated by the consortium into a database and to provide the bioinformatics tools to analyse and interpret this information. The objectives of the three clusters of workpackages and the links between the workpackages are detailed below.

"Structure" cluster (WP1 and WP2). The workpackages in the "Structure" cluster are aimed at solving 3D structures of individual SH3 domains and of complexes consisting of SH3 domains with bound ligands (either peptides or complete protein domains). The selections of SH3 targets will take place in WP6, where the systematic classification of SH3 domains will be carried out based on sequence and functional information. The selected targets will be cloned, expressed and purified in WP1. This workpackage will be co-ordinated by the SME (P7) in the consortium. This SME will provide state-of-the-art E. coli expression systems and the facilities to purify and analyse recombinant proteins. This work will be complemented by three academic partners (P3, P4 and P5). The deliverables of this workpackage are purified proteins in sufficient amounts for structural analysis. Being aware about the limited uncertainty of successful structural biology experiments, we have decided to include the two major structural biology techniques, X-ray crystallography and NMR spectroscopy, into this network. Cue to the co-ordination of the expression and purification of the target material suitable for X-ray and NMR investigations within in WP1, it will be ensured that these two technologies will be utilised in a synergistic and complementary way. The network is aiming to carry out the structural analysis of teh SH3 domain targets and SH3-domain peptide complexes in high throughput mode. The X-ray crystallography projects will be carried out by partner 3 at a synchrotron ite, allowing rapid data acquisition, processing and on-site structure solution. The NMR partner (partner 5) is located at one of teh leading centers of excellence in the field of structural biology in Europe. The output of this workpackage, i.e. the structures of SH3 domains and of SH3-ligand complexes, will be fed into the database and will be used as input for modelling studies, the development of software to predict SH3-ligand interactions (WP6 and WP7) and for functional studies.

"Function" cluster (WP3, WP4 and WP5). The workpackages in this cluster aim at identifying the preferred ligands of each of the 29 SH3 domains in S. cerevisiae (and of a selected set in S. pombe) and assess the cellular function of SH3 domain-containing proteins. Peptide ligands will be identified by panning peptide repertories of random sequences displayed on phages using GST-fused SH3 domains (WP3). These experiments will reveal the ligand preferences for each SH3 domain within the family. These identified ligands will be classified into one of the established categories. High affinity ligands will be used for co-crystallisation experiments with their cognate SH3 domain (WP2). The second objective of workpackage 3 is the identification of natural protein ligands of yeast SH3 domains. These will also be identified by panning techniques. In this case, libraries of yeast genomic fragments will be used consisting of protein domains fused to capsid proteins of filamentous phages. The partner responsible for this workpackage, P2, has proven experience for this task. To verify whether the "correct" natural ligand for each SH3 has been identified, co-localisation studies will be performed with GFP-tagged proteins (WP5). Based on the data from these studies we are planning determine three full SH3-protein ligand complexes using X-ray crystallography. Workpackage 4 aims to identify the function of each of the yeast SH3 domain containing proteins. This workpackage uses DNA microarrays to generate whole-genome expression profiles of yeast cells either deleted for, or overexpressing SH3 domain-containing proteins. In a manner analogous to fingerprinting, these expression profiles, thereby allowing the identification of the cellular pathway or protein complex in which these SH3 proteins operate. Such analysis will also help to confirm the natural SH3 ligands as identified in workpackage 3.

"Biocomputing" cluster (WP6, WP7 and WP8). The biocomputed oriented cluster is crucial for progression through all stages of the project. The delivery of this workpackage will define the internal infrastructure of teh proposed network for data input and it will serve as a switch to disseminate the data of the network for applications by academic and industrial groups, external to the network. These workpackages will be co-ordinated by two partners that have experience with bioinformatics algorithms and databases (P6), as well as with modelling and design (P4). Workpackage 6 will extract all sequence and functional information on SH3 domains from the currently available databases, with emphasis on the S. cerevisiae and S. pombe genomes. The different SH3 domains will be systematically classified, and those yeast SH3 domains taht do not have close relatives in the databases will be selected for structural analysis (WP2). WP6 will also produce structural models for SH3 domains either by homology modelling bases on multisequence comparisons or by threading techniques (fold recognition) for those SH3 domains withoug sequence homology. The deliverables of workpackage 2 (3D structures of SH3 domains), workpackage 3 (preferred peptide ligands of SH3 domains) and workpackage 6 (SH3 models) will be used in WP7 to develop software to predict ligand sequences that bind to SH3 domains. Designed new ligands will be validated experimentally. The design and management of the database will be carried out in workpackage 8 (by partner 6).

Pert diagram of the proposed network (shown here). The deliverables of the eight workpackages will be deposited in the SH3 database. These data will be available for Research&Development projects by academic and industrial enterprises. The primary imput into de network originates from publicly available databases (in particular S. cerevisiae and S. pombe). The workpackages are organised into three types of activities indicated with STRUCTURE, FUNCTION  and BIOCOMPUTING. All workpackages will directly benefit from the deposited data in the database (not indicated). The workpackages are denoted WPx, and the partner co-ordinators are given in parentheses as Px.


References

  1. Cicchetti P., Mayer BJ., Thiel G., Baltimore D. (1992) Science 257, 803-806.
  2. Cohen GB., Ren R., Baltimore D. (1995) Cell 80, 237-248.
  3. Dalgarno DC., Botfield MC., Rickles RJ. (1997) Biopolymers 43, 383-400.
  4. Feng S, Chen JK., Yu H., Simon JA., Schreiber SL. (1994) Science 266, 1241-1247.
  5. Hughes TR., Marton MJ., Jones AR., Roberts CJ., Stoughton R., Armour CD., Bennett HA., Coffey E., Dai H., He YD., Kidd MJ., King AM., Meyer MR., Slade D., Lum PY., Stepanialnts SB., Shoemaker DD., Gachotte D., Chakraburtty K., Simon J., Bard M., Friend SH. (2000) Cell 102, 109-126.
  6. Kohda D., Hatanaka H., Odaka M., Mandiyan V., Ullrich A., Schlessinger J. (1993) Cell 72, 953-960.
  7. Lee C-H., Leung B., Lemmon MA., Zheng J., Cowburn D., Kuriyan J., Saksela K. (1995) EMBO Journal 14, 5006-5015.
  8. Lee C-H., Saksela K., Mirza UA., Chait B., Kuriyan J. (1996) Cell 85, 931-942.
  9. Musacchio A., Noble M., Pauptit R., Wierenga R., Saraste M. (1992) Nature 359, 851-855.
  10. Musacchio A., Wilmanns M., Saraste M. (1994) Progress in Biophysics and Molecular Biology 61, 283-297.
  11. Pawson T. (1995) Nature 373, 573-580.
  12. Ren R., Mayer BJ., Cicchetti P., Batlimore D. (1993) Science 259, 1157-1161.
  13. Rickles RJ., Botfield MC., Weng Z., Taylor JA., Green OM., Brugge JS., Zoller MJ. (1994) EMBO Journal 13, 5598-5604.
  14. Wu X., Knudsen B., Feller SM., Zheng J., Sali A., Cowburn D., Hanafusa H., Kuryan J. (1995) Structure 3, 215-226.
  15. Yu H., Chen JK., Feng S., Dalgarno DC., Brauer AW., Schreiber SL. (1994) Cell 76, 933-945.
  16. Yu H., Rosen MK., Shin TB., Seidel-Dugan C., Brugge JS., Schreiber SL. (1992) Science 258, 1665-1668.