ExPASy Home page Site Map Search ExPASy Contact us
Hosted by NCSC USMirror sites:Canada China Korea Switzerland Taiwan
Search for

Help for the ExPASy BLAST Interface

Query sequence

Enter a query protein sequence in raw format or a SWISS-PROT, TrEMBL or TrEMBL-new accession number.

Output format

HTML - BLAST native output format with hyperlinks and some formatting.
NiceBlast - View with full descriptions and organism sources.
Plain Text - Text format with no links.

BLAST program and databases


Programs available on ExPASy

blastp compares a protein query sequence against a protein sequence database.
tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.

Programs available elsewhere

blastn

compares a nucleotide query sequence against a nucleotide sequence database.
Available at EMBnet Switzerland

blastx compares a nucleotide query sequence translated in all reading frames against a protein sequence database.
Available at EMBnet Switzerland
tblastx

compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
Available at EMBnet Switzerland

PSI-BLAST Position Specific Iterative BLAST detects weak homologs by building a profile from a multiple alignment of the highest scoring hits in an initial BLAST search.
Available at NCBI
PHI-BLAST

Pattern-Hit Initiated BLAST combines matching of regular expressions with local alignments surrounding the match.
Available at NCBI

xblast a totally unrelated game. Available at xblast center

Databases

Protein Databases

SWISS-PROT Manually annotated protein sequence database (over 100000 entries). Includes weekly updates and splice variants.
SWISS-PROT, TrEMBL and TrEMBL-new TrEMBL is an computer-annotated supplement to SWISS-PROT with some redundancy (over 600000 entries). TrEMBL-new contains the translations of the newest submissions to the EMBL database. Contains all consolidated proteins and ORFs, with weekly updates and annotated splice variants.
complete microbial proteomes Non-redundant sets of all the proteins from complete genome sequencing projects, compiled from SWISS-PROT and TrEMBL.
Translated EST Protein sequences derived from EST sequencing data (human, mouse, rat, zebrafish, drosophila, bovine, arabidopsis). This database contains many potential errors because of the low quality of the data.

DNA Databases (for tblastn)

All databases are subdivided into taxonomic sections, selectable from the Taxonomic groups drop-down list.

All EMBL + GSS All entries from the EMBL database (equivalent to GenBank and DDBJ).
HTG Unverified data from high-throughput genomic sequencing. Usually in the form of cosmids.
dbEST Expressed sequence tag database from the NCBI.
EST contigs Database of contigs based on EST clusters from Unigene (human, mouse, rat, bovine, zebrafish) and SwissClusters (Drosophila melanogaster, Arabidopsis thaliana).
Unigene EST Database of EST clusters (list of ESTs known to match the same cDNA) from the NCBI (updated occasionally). This database contains also useful information like STS matches, tissue distribution, or transcript map.
Complete genomes Genomes released in the form of a complete, assembled sequence.

Taxonomic groups

A taxonomic subselection can be made through a free-text input field for blastp (excepted on translated EST) and as a drop-down list with database subsections for tblastn.

For blastp, you may enter either a numeric NCBI TaxID (e.g. 10090), or a taxon (e.g. Bacteria), or a species name either in Latin or in English. For the list of known species names and synonyms, see SWISS-PROT species list. As the hits will be filtered in a post-processing stage, this may result in a significant delay.

A display of the BLAST hits as a taxonomic tree is also available from the result page, by clicking on the "Taxonomic view of BLAST hits" button.

E-mail address

Enter your e-mail address to receive the results by e-mail. Otherwise, they will arrive interactively in your browser. The e-mail option is recommended for tblastn searches on big databases such as EMBL. If your interactive search is too long, you will receive an error message requiring you to resubmit via e-mail.

Options

Comparison matrix

The matrix assigns a probability score for each position in an alignment. The BLOSUM matrix assigns a probability score for each position in an alignment that is based on the frequency with which that substitution is known to occur among consensus blocks within related proteins. BLOSUM62 is among the best of the available matrices for detecting weak protein similarities. The PAM set of matrices is also available. If the "Auto-select" option is selected (default), the matrix will be selected depending on the query sequence length, based on the following (
empirically constructed) table:
Query length Substitution matrix
<35 PAM-30
35-50 PAM-70
50-85 BLOSUM-80
>85 BLOSUM-62

Setting the E threshold

The expectation value (E) threshold is a statistical measure of the number of expected matches in a random database. The lower the e-value, the more likely the match is to be significant. E-values between 0.1 and 10 are generally dubious, and over 10 are unlikely to have biological significance. In all cases, those matches need to be verified manually. You may need to increase the E threshold in the following cases :

Filter the sequence for low-complexity regions

Low-complexity regions (e.g. stretches of cysteine in CSP_DROME (
Q03751), hydrophobic regions in membrane proteins) tend to produce spurious, insignificant matches with sequences in the database which have the same kind of low-complexity regions, but are unrelated biologically. If this option is checked, the query sequence will be run through the program SEG, and all amino acids in low-complexity regions will be replaced by X's which will appear in the alignment. The masked regions will also be visible as slashed regions in the PaintBlast image.

Gapped alignment

This will allow gaps to be introduced in the sequences when the comparison is done, and is usually left checked.

Output page

The output page is divided into three sections. The first is a summary of the hits, including the score and e-value of the best HSP for each hit. The second part is a PaintBlast image summarizing the matching portions for each hit. The third part contains the alignments between the query and the hits. From the summary of the hits, several operations may be performed on selected sequences. This is only available for blastp against the protein databases :

Other references

BLAST tutorial at NCBI

BLAST Frequently Asked questions at NCBI (includes error messages)

The Statistics of Sequence Similarity Scores by Altschul

Last modified 28/Mar/2001 by AGA
ExPASy Home page Site Map Search ExPASy Contact us
Hosted by NCSC USMirror sites:Canada China Korea Switzerland Taiwan