Pairwise and Multiple Sequence Alignments,
and Similarity Searches
Links to Tools and Databases:
-
Aligning two sequences
-
BLAST
-
ClustalW
-
T-Coffee
- Databases
- Multiple Sequence Alignment viewers
Aligning
Two Sequences
>RPE_YEAST
MVKPIIAPSI LASDFANLGC ECHKVINAGA DWLHIDVMDG
HFVPNITLGQ PIVTSLRRSV
PRPGDASNTE KKPTAFFDCH MMVENPEKWV DDFAKCGADQ
FTFHYEATQD PLHLVKLIKS
KGIKAACAIK PGTSVDVLFE LAPHLDMALV MTVEPGFGGQ
KFMEDMMPKV ETLRAKFPHL
NIQVDGGLGK ETIPKAAKAG ANVIVAGTSV FTAADPHDVI
SFMKEEVSKE LRSRDLLD
>RPE_MYCPN
MLNLVVNREI AFSLLPLLHQ FDRKLLEQFF ADGLRLIHYD
VMDHFVDNTV FQGEHLDELQ
QIGFQVNVHL MVQALEQILP VYLHHQAVKR ISFHVEPFDI
PTIKHFIAQI KQAGKQVGLA
FKFTTPLVNY ERLVQQLDFV TLMSVPPGKG GQAFNSAVFN
NLKQAHKYHC SIEIDGGIKL
DNIHQIQDDV NFIVMGSGFI KLERWQRQQL LKTNQ
-
Please make a global alignment (choosing the option "needle") and also
a local alignment (choosing the option "water"). What are the differences
between the two outputs? Do you think that these two sequences are related?
-
Now try to align the two sequnces using different substitution matrices
and changing the gap opening and gap extension penalties. Use, for example,
BLOSUM62 and BLOSUM40. Do you see any difference? (you can check the pre-computed
results here).
-
How could we decide which of the alignments obtained is better?
Similarity
Search with BLAST
-
Use the sequence RPE_YEAST, to make a BLAST search on a protein database.
-
For the moment, please use the EMBL
or EBI BLAST servers, since
it is easier to retrieve the sequences identified by the BLAST search,
and we will use them later, in another exercise.
-
If you are using the EMBL BLAST server, use the following parameters:
-
database=Swiss-Prot (nrdb95 provides more
coverage, but we would obtain too many related sequences, making
the analysis more complicated).
-
filter=none
-
descriptions=250
-
alignments=250
-
Once the results of the BLAST search have been returned, you can retrieve
the sequences that have been identified as similar, by clicking on "Get
selected sequences".
-
By default, those sequences with the best p-values appear checked, but
you could select more or less sequences.
-
Those selected by default have been saved in this file.
-
Now you can try to use the NCBI
BLAST server and compare (the EMBL BLAST server uses WU-BLAST, which
is different to the original BLAST developed at the NCBI).
-
Now look for RPE_MYCPN in the output of the two BLAST searches using
RPE_YEAST as a query. Check the associated e-value (or p-value) Is the
similarity between the to sequences significant?
Note: to retrieve NCBI sequences you need to to choose the sequences you want by hand (!) and then
click on "get sequences". Then on the next page choose the two options "FASTA" and display as "text" in
the drop down menus".
Multiple
Sequence Alignment of Sequences Identified After a BLAST Search.
In this exercise you will make a multiple sequence alignment of the
sequences that have been identified in a similarity search with BLAST.
-
CREATING A MULTIPLE SEQUENCE ALIGNMENT WITH ClustalW.
- Use a ClustalW web server.
- Leave the parameters as appear by default, but change the output format:
- output format= GCG (or GCG-msf).
-
Visualisation of Multiple Sequence Alignments (MSAs)
-
Multiple sequence alignments can be more easily interpreted if the columns
in the alignment are coloured following some criteria (for example, the
degree of conservation).
-
If you are running ClustalW at the web server of the EBI,
you will have the option of getting a coloured alignment. You will have
also the options of visualizing the alignment with JalView or constructing
a tree.