Practical lesson 3
The NCBI WWW server and Entrez, part 2
By Paulino Gómez Puertas, CAB, INTA-CSIC; updated by Manuel J
Gómez, PDG, CNB, CSIC.
The aim of these exercises is to learn how to make field delimited text
searches and to use the Limits and History forms of Entrez.
A list of Limits Available, Search
Fields Available, Search Field Descriptors and Display Formats is available,
following this link.
A. Simple Searching
-
Look for all "photosystem" related sequences in the Nucleotide database
(use wildcard "*").
-
How many spinach sequences exist in the Nucleotide databank?
-
And, how many in the Protein, Structure or Genome databanks?
-
Display the FASTA view of the Protein entry AAD02267.
-
Display the graphics view of the Nucleotide entry corresponding to the
Protein entry AAD02267.
-
How many non EST Nucleotide sequences were published by someone named "Jones"?
-
How many potato polypeptides are included into the Structure database?
-
Search for all plant proteins with a molecular weight range from 50,000
to 50,050 dalton (use field range format "050000:050050[MOLWT]").
-
Search for all plant proteins with a sequence length from 300 to 310 aminoacids.
-
How many spinach proteins have less than 50 amino acids in length?
-
How big is the BIGGEST protein in the Protein database?
-
How many articles have been written by someone called Ras?
-
What kind of protein or peptide is known as Ramos?
B. Refining your search
-
How many non EST Nucleotide sequences were published by someone named "Smith"
and have a sequence length of 3000 to 4000 nucleotides? How many of them
were not published in 1999? Perform three independent searches and use
"History" to combine them [((#1 AND #2) NOT #3)]
-
What are the differences among [((#1 AND #2) NOT #3)] / [(#1 AND
(#2 NOT #3)] / [((#1 AND #2) OR #3)] / [((#1 NOT #2) AND #3)]?
-
Search for all plant rRNA sequences in the Nucleotide database (use "Limits"
to restrict to rRNA in the "Molecule" pull-down menu).
-
Search for all arabidopsis mitochondrion genes (using "Limits").
-
Search for all arabidopsis chloroplast genes in the Nucleotide database
updated in the last year.
-
Retrieve all genomic plant genes in the Nucleotide database with a publication
date from 1990 to 1995.
-
Using "Index", obtain the number of tomato sequences in the Nucleotide
database. How many non-tomato entries contain the word "tomato"?
-
Search for all the protein sequences from chloroplasts of spinach, tomato
and potato.
-
Search for all the genomic sequences of protein kinases from arabidopsis.
-
Using history, get all glucanase sequences from spinach, tomato and potato,
excluding ESTs and patents.
C. Link Out
-
Obtain the domain distribution of the protein sequence of "phosphoinositide
specific phospholipase C" from Arabidopsis thaliana.
February 2004