ENCODE Structures   PDG   CNB   CSIC

PDB Structure Data

Data for All Sequences

- Where PDB Structures Are Found by BLAST

Models for the Dataset

Models have been built for some sequences by two groups (Anna Tramontano's in Rome and Janet Thornton's in Cambridge). See their respective results pages for details.

PDB Structures

13 sequences have their whole sequence covered by PDB structure. 8 of them are alternative splice variants of the others. In fact seven of the sequences are 100% identical to the other, so the 13 sequences are covered by 5 different templates. The reason for the discrepancy between isoforms and identical sequences is that although GS1-273L24.4-001, GS1-273L24.4-002, GS1-273L24.4-004, GS1-273L24.4-005 and GS1-273L24.4-006 are isoforms, they have 100% identity to two different templates. The sequences of GS1-273L24.4-001, GS1-273L24.4-004 and GS1-273L24.4-005 (template 1hp8, 68 residues, 3-helix bundle) have nothing in common with those of GS1-273L24.4-002 and GS1-273L24.4-006 (1qtuA, 117 residues, mainly beta-sheets).

81 more sequences have part of their sequence covered at 100% identity by a PDB structure. 32 of those are alternative splice variants where the same sequence is covered by the same PDB. There are also four isoforms for which two structures exist for two parts of the sequence.

11 sequences have their whole sequence covered by PDB structure with up to 97% sequence identity. 6 of them are alternative splice variants of those that have their whole sequence covered by other PDBs (AC104389 has 4 variants with 4 different PDBs).

On top of the 88 sequences that have 100% identity to at least one structure in the PDB, 32 sequences have part of the sequence covered by a PDB structure with up to 97% sequence identity.

At least 12 isforms are covered by two non-overlapping structures with over 90% identity AC002543.3-001 - 1r1wA and 1shyB, AC012630.1-001 with 1ugvA and 1f7cA, and RP11-298J23.1-003 with 1avfA and 1htrP.

RP11-298J23.1-003	SeqLen: 140
1HTR_P	43	3.00E-020	Starts 17	59	100%
1AVF_A	329	5.00E-035	Starts 69	140	97%

AC012630.1-001	SeqLen: 814	
1UGV_A	72	3.00E-030	Starts 754	814	98%	
1F7C_A	231	e-121		Starts 353	583	93%

AC002543.3-001	SeqLen: 1140	
1R1W_A	312	9.00E-060	Starts 1033	1140	100%
1SHY_B	551	0		Starts 1	533	99%

587 sequences find at least one template structure with BLAST.

PDB Structures and Alternative Splicing

As there are not too many sequences with PDB structures at 100% identity and fewer that have their entire sequence covered by a structure, much of the comparison has to be done using structures of lower percentage identity. Given that structures tend to be more conserved than sequence or function, this is a reasonable thing to do.

The isoform AC104389.21-004 and globin 1fdhG (100%ID) where the structure would lose most of the first two N-terminal helices, and sequence RP11-505P4.2-015 and structure 1ijfA (91% ID) where just the first 144 of 450 residues would be expressed and where it would split the N-terminal EF1A domain.

Isoform RP11-247A12.5-001 is missing a portion of sequence from approximately residue 280 to 370 in comparison to variant sequence RP11-247A12.5-006. A PDB structure exists with 91% identity (1ndfA) and the portion of the structure that would be missing from the structure would remove several helices and the central strand of a 5-stranded sheet.

A more detailed breakdown of the PDB structures found by BLAST can be found on this page and elsewhere.

 

Michael Tress
mtress @cnb.uam.es
Protein Design Group
Centro Nacional de Biotecnología (CNB-CSIC)
Calle Darwin, Campus de la Universidad Autónoma de Madrid, Cantoblanco.
28049 MADRID.
Tel: (+34) 91 585 4676   Fax: (+34) 91 585 4506

Apologies to Osvaldo Graña from whom I totally stole the page design.

Thanks to Petr Sobola for the skater(s).