Exercises: Sequence Retrieval System, part 2
Paulino Gómez Puertas, CAB
The aim of these exercises practice field limited text searches in SRS,
and to use some advanced options, such as linking databases and performing
subentry queries.
A. Simple Queries
-
Search SWISS-PROT for proteins involved in photosynthesis. First
search in the Description field then try other fields such as Keywords
(try using wildcard '*').
-
Search for the SWISS-PROT entry with accession n'umber P00221 (and follow
some of the associated links).
-
Search for all authors with surname "Smith" in SWISS-PROT.
-
Search for all SWISS-PROT proteins involved in photosynthesis which were
published by someone named "Smith". Do this search first as a combination
of searches in the "Query Manager" and again from the query form.
-
How many non-EST sequences exist in EMBL? ( Use the 'Division' index)
-
How many tomato sequences exist in EMBL and SWISS-PROT. What is the problem
with just using 'tomato' in the query form with standard setting?
-
Find all sequences in EMBL which were released this year ( Select the 'Date'
field and use dd-mmm-yy(yy): format).
-
Find all sequences in both SWISS-PROT and SWISSNEW that were created between
1. January 1996 and 1 July 1996
-
Find the EMBL sequences that were published in Nature Vol. 408, pages
157 to 158 in 2000. (Use the field information page to get further help)
-
Search all dihydrofolate reductases in SWISS-PROT.
-
Search all dihydrofolate reductases in SWISS-PROT with sequences of length
between 500 and 700.
-
Search 'kinase' in the 'Description' index of SWISS-PROT. Why are some
of the found entries not kinases? Find at a word that when found together
with 'kinase' strongly indicates that the protein is not a kinase at all.
(Select the 'Description' field to be displayed in the entry list).
B. Browsing Indices
-
How many spellings exist in the 'Keywords' index of EMBL for the name(s)
of the ribulose bisphosphate carboxylase. (Use lots of the wildcard '*'
anywhere in the search word)
-
Find out if the 'Description' and 'Keywords' indices of SWISS-PROT contain
any words with spaces. ( Use a search expression with a wildcard ('*')
at the end and the beginning)
-
What is the shortest author name in SWISS-PROT? ( Use wildcards '?')
-
Search 'homeobox*' in 'AllText' of all sequence databanks. How many indices
are searched alltogether? Is the query 'homeobox*' suitable to find all
proteins containing a homeobox?
-
Use a regular expression search to find all words consisting of 'nif' and
another character ( Don't forget to put the regular expression within '/'s)
(Help available at "SRS users manual 8.1.3")
-
For which species exist at least 1000 entries in SWISS-PROT ( Use the fact
that organism names contain a space which higher level taxa mostly don't
have).
C. Subentry Queries
-
How many SWISS-PROT sequences exist with transmembrane regions? ( Use the
'transmem' feature key)
-
How many transmembrane regions exist in SWISS-PROT?
-
How many transmembrane regions in SWISS-PROT are shorter than 10 amino
acids or longer than 50?
-
How many pseudo genes annotated as CDS (CoDing Sequence) exist in EMBL?
-
Retrieve the set of all spinach transmembrane segments and save them to
your directory using the view "FastaSeqs".
D.Using Views
-
Create a view for EMBL in the query form that displays the ID and Description
line and the sequence in PIR format then perform a new search for all spinach
sequences.
-
Create another view for EMBL in the query form that displays the ID, AccNumber,
Description and DBOrigin in a table and again search for all spinach sequences.
-
Search in the Description field of SWISS-PROT for all "photosystem II"
sequences from spinach and compare the hydrophilicity plots. (Use the 'proteinChart'
view).
E. Performing Links
-
For how many 'spinach' SWISSPROT entries do we know its tertiary structure?
-
How many 'Arabidopsis thaliana' transmembrane proteins are included into
the ENZYME databank?
-
For how many unique reactions catalysed by an enzyme do we have its tertiary
structure in PDB? (Enzyme reactions are described in the databank ENZYME)
-
How many PDB entries have calcium binding sites?
-
Search the dihydrofolate reductase family in PROSITE and link it to SWISSPROT.
If you compare with the set from exercise A.10, are the two sets the same?
Febrero 2004