Practical lesson1
PAIRWISE AND MULTIPLE SEQUENCE ALIGNMENTS.
SIMILARITY SEARCHES
By F. Abascal, PDG, CNB,
CSIC.
Links to tools and databases:
-
Aligning two sequences
-
BLAST
-
ClustalW
-
Databases
-
Multiple Sequence Alignment viewers
Exercise number 1.
Aligning two sequences
>RPE_YEAST
MVKPIIAPSI LASDFANLGC ECHKVINAGA DWLHIDVMDG
HFVPNITLGQ PIVTSLRRSV
PRPGDASNTE KKPTAFFDCH MMVENPEKWV DDFAKCGADQ
FTFHYEATQD PLHLVKLIKS
KGIKAACAIK PGTSVDVLFE LAPHLDMALV MTVEPGFGGQ
KFMEDMMPKV ETLRAKFPHL
NIQVDGGLGK ETIPKAAKAG ANVIVAGTSV FTAADPHDVI
SFMKEEVSKE LRSRDLLD
>RPE_MYCPN
MLNLVVNREI AFSLLPLLHQ FDRKLLEQFF ADGLRLIHYD
VMDHFVDNTV FQGEHLDELQ
QIGFQVNVHL MVQALEQILP VYLHHQAVKR ISFHVEPFDI
PTIKHFIAQI KQAGKQVGLA
FKFTTPLVNY ERLVQQLDFV TLMSVPPGKG GQAFNSAVFN
NLKQAHKYHC SIEIDGGIKL
DNIHQIQDDV NFIVMGSGFI KLERWQRQQL LKTNQ
-
Please make a global alignment (choosing the option "needle") and also
a local alignment (choosing the option "water"). What are the differences
between the two outputs? Do you think that these two sequences are related?
-
Now try to align the two sequnces using different substitution matrices
and changing the gap opening and gap extension penalties. Use, for example,
BLOSUM62 and BLOSUM40. Do you see any difference? (you can check the pre-computed
results here).
-
How could we decide which of the alignments obtained is better?
Exercise number 2.
Similarity search with BLAST.
-
Use the sequence RPE_YEAST, to make a BLAST search on a protein database.
-
For the moment, please use the EMBL
or EBI BLAST servers, since
it is easier to retrieve the sequences identified by the BLAST search,
and we will use them later, in another exercise.
-
If you are using the EMBL BLAST server, use the following parameters:
-
database=Swiss-Prot (nrdb95 provides more
coverage, but we would obtain too many related sequences, making
more complicated the analysis).
-
filter=none
-
descriptions=250
-
alignments=250
-
Once the results of the BLAST search have been returned, you can retrieve
the sequences that have been identified as similar, by clicking on "Get
selected sequences".
-
By default, those sequences with the best p-values appear checked, but
you could select more or less sequences.
-
Those selected by default have been saved in this file.
-
Now you can try to use the NCBI
BLAST server and compare (the EMBL BLAST server uses WU-BLAST, which
is different to the original BLAST developed at the NCBI).
-
Now look for RPE_MYCPN in the output of the two BLAST searches using
RPE_YEAST as a query. Check the associated e-value (or p-value) Is the
similarity between the to sequences significant?
Exercise number 3.
Multiple sequence alignment of sequences identified after a BLAST search.
In this exercise you will make a multiple sequence alignment
of the sequences that have been identified in a similarity search with
BLAST.
-
CREATING A MULTIPLE SEQUENCE ALIGNMENT WITH ClustalW.
-
You can use a ClustalW web server, or run the program locally.
-
Using a web server:
-
Leave the parameters as appear by default, but change the output format:
-
output format= GCG (or GCG-msf).
-
Running the program locally:
-
You should obtain a file like this.
-
VISUALIZATION OF MULTIPLE SEQUENCE ALIGNMENTS (MSAs)
-
Multiple sequence alignments can be more easily interpreted if the columns
in the alignment are coloured following some criteria (for example, the
degree of conservation).
-
If you are running ClustalW at the web server of the EBI,
you will have the option of getting a coloured alignment. You will have
also the options of visualizing the alignment with JalView or constructing
a tree.
-
If you are running ClustalW by command line, you will obtain a text
file with the alignment in MSF format. A popular visualization tool that
can be used to enhance the look of your MSA is Belvu (download
Belvu for Linux). There is no Belvu version for Windows, though.
-
The previously obtained MSA would look like this
with Belvu .
-
To visualize the alignment with Belvu, simply run the following command
from a shell console:
belvu file.msf
-
With Belvu, it is possible to manipulate or perform several kinds of
analyses with the alignment, as for example, sorting the sequences according
to several criteria, removing redundant sequences, or generating a NJ tree
from the MSA.
-
Other MSA visualization tools, that you could have installed, are Pfaat
and JalView, which
are available for Windows. In addition, you can use BoxShade,
which is available as a web server.
Exercise number 4.
Discussion: What kind of information can be obtained from a Multiple Sequence
Alignment?
Exercise number 5.
Identification of coding regions using BLAST
(From an exercise designed by R.
Alonso Allende and MJ Gómez)
>human
AGCTTTCTTCTTTTCCCTGTTGCTCAAATAAATAGTGTTCTTTGCTCAAA
CCCCCTTTCCCTCCTCCTTCTGCAATCTCAGCGCCTAGCGAAATCTGTTT
TCTTCATTGTAACCTCAGCTTCACCGCAATTAATTTTTTTTCCCTCTGGT
CACAAGATAATTCCTGACGCCAGTGAGTCTGGAGGTCAGACGAACAGCAA
ATTGGGGAACAAGGCGGCACTAATTCCTTACAAGTTCCTTGAAAAATCTT
TCGCTTAAAAAAAACGGGGGGTGGGGGGAGCTTCTTTGCTGTTCAGGGAT
TTATGCCTCGCGGAGCTGTGGCTCGAACCAGTGTTGGCTAAGGCGGACTG
GCAGGGGCAGGGAAGCTCAAAGATCTGGGGTGCTGCCAGGAAAAAGCAAA
TTCTGGAAGTTAATGGTTTTGAGTGATTTTTAAATCCTTGCTGGCGGAGA
GGCCCGCCTCTCCCCGGTATCAGCGCTTCCTCATTCTTTGAATCCGCGGC
TCCGCGGTCTTCGGCGTCAGACCAGCCGGAGGAAGCCTGTTTGCAATTTA
AGCGGGCTGTGAACGCCCAGGGCCGGCGGGGGCAGGGCCGAGGCGGGCCA
TTTTGAATAAAGAGGCGTGCCTTCCAGGCAGGCTCTATAAGTGACCGCCG
CGGCGAGCGTGCGCGCGTTGCAGGTCACTGTAGCGGACTTCTTTTGGTTT
TCTTTCTCTTTGGGGCACCTCTGGACTCACTCCCCAGCATGAAGGCGCTG
AGCCCGGTGCGCGGCTGCTACGAGGCGGTGTGCTGCCTGTCGGAACGCAG
TCTGGCCATCGCCCGGGGCCGAGGGAAGGGCCCGGCAGCTGAGGAGCCGC
TGAGCTTGCTGGACGACATGAACCACTGCTACTCCCGCCTGCGGGAACTG
GTACCCGGAGTCCCGAGAGGCACTCAGCTTAGCCAGGTGGAAATCCTACA
GCGCGTCATCGACTACATTCTCGACCTGCAGGTAGTCCTGGCCGAGCCAG
CCCCTGGACCCCCTGATGGCCCCCACCTTCCCATCCAGGTAAGCCTCGAA
GTCGGGACAGGGCTGAACACCCAGGCAAGGATGCTGCGGGACCCTCGGAG
CTCCCGATTGCCTCGCGTAACTCTTCCCTCTTTTCCTCTAATCAGACAGC
CGAGCTCGCTCCGGAACTTGTCATCTCCAACGACAAAAGGAGCTTTTGCC
ACTGACTCGGCCGTGTCCTGACACCTCCAGGTGAGTATCTCCTCTCTTGG
AGAGGGAGGTTTAAACGGCAAGTCCTGGAGTTGGCAGACGTTTTGAAAAA
TTGCCACTCACTCGGTTTAGGGAAACTGAGGCCAGAGAGGGACAAGTGAC
TTGCCCATGGTTGCATCAAATGAATGGCAGAGTCAGTTTCCATGTGATGT
GCATTTAAGCCTTAATGCGCCTGGCCCTGCCTCCGCAGTGGCCGAGGTCT
GGCAAGTAGACATGGTCCGACTAAATACAAGTCTTTCTGTTCCATGTTGT
ATAGGAGCTGTCTTCGGCAGCCCCCTCCCAGCTAGTGTCAATTCCAAGTA
GGAGGGGTAGCGCAACGTCCGCCTGTGGTCTTTGGCGCCAACTGGGTGGG
GGCAGCGTGGGGGGCGGAGTTATCAGGCTGGAGGTACAGACCAAGTTTCC
TCCCTGGCGCCGGCCAGTCTGCGGACGGCCCCCGCCTCGGCACGCTCGGC
GGAAACTGACTGCTCCTTGGTCTTCTTTCCTCCCCCGCCCAGAACGCAGG
TGCTGGCGCCCGTTCTGCCTGGGACCCCGGGAACCTCTCCTGCCGGAAGC
CGGACGGCAGGGATGGGCCCCAACTTCGCCCTGCCCACTTGACTTCACCA
AATCCCTTCCTGGAGACTAAACCTGGTGCTCAGGAGCGAAGGACTGTGAA
CTTGTGGCCTGAAGAGCCAGAGCTAGCTCTGGCCACCAGCTGGGCGACGT
CACCCTGCTCCCACCCCACCCCCAAGTTCTAAGGTCTTTTCAGAGCGTGG
AGGTGTGGAAGGAGTGGCTGCTCTCCAAACTATGCCAAGGCGGCGGCAGA
GCTGGTCTTCTGGTCTCCTTGGAGAAAGGTTCTGTTGCCCTGATTTATGA
ACTCTATAATAGAGTATATAGGTTTTGTACCTTTTTTACAGGAAGGTGAC
TTTCTGTAACAATGCGATGTATATTAAACTTTTTATAAAAGTTAACATTT
TGCATAATAAACGATTTTTAAACACTTGTGTATATGATGACACCCGTCTC
CATTAAGTACTAATGATGCTTTCTCGCACATGGCCGAATTTTGGGAGCTT
TGGGAAAGTGAACTTGCTTATTCTACGAGAGGGAAATGAAAAACTGCCTG
GTTGAGAGGGGATGGGGTGGAGAGAGAAGGGTTCATGATGGGAGTCTCAT
GTCCATTGAGGGATGGGTGCAGAGAAAAGTTCTGGCTCTGCCTCATTATT
TCAGAGATGAAACCAGAGACTGGTGCAAGCT
-
The nucleotide sequence above is a fragment of the sequence of a human
chromosome.
-
We would like to check whether there are experimental evidences indicating
the presence of transcribed or coding regions.
-
To do it, we will use BLAST to search a ESTs database, using the above
sequence as query.
-
Access the NCBI BLAST
server web page, and select "Nucleotide-nucleotide BLAST [blastn]".
-
Copy the human sequence above, and paste it in the "Search" box.
-
Choose "est_human" in the "Choose database" menu, and start the search
bY clicking on "BLAST!".
-
You should obtain a result similar to this.
-
Questions:
-
Do you have a clear idea of what are ESTs?
-
Do you think that the sequence above contains a transcribed or coding
sequence?
-
If there were a transcribed sequence, could you infer the structure
of the gene?
-
Can you think of another strategy to test whether the human sequence
contains a coding sequence, using BLAST but NOT a search against the ESTs
database?
February 2004