Patterns and Profiles
Protein Motifs, Domains and Families

Tools

Scanning sequences to find matches to patterns or profiles (input: sequence; output motifs or domains)
  • ScanProsite; Scans a sequence to find matches to PROSITE or  SWISS-PROT and TrEMBL with a user provided pattern.
  • PPSEARCH; Scans a sequence to find matches to PROSITE (allows a graphical output); at EBI
  • ProfileScan; Scans a sequence to find matches to protein patterns and profiles in PROSITE and Pfam.
  • Frame-ProfileScan; Scans a short DNA sequence against protein profile databases (including PROSITE) 
  • InterProScan; Scans a sequence against the InterPro database of patterns and profiles (which integrates information from several other databases)
  • Motif; Scans a sequence against several databases of patterns and profiles.
Querying databases to find sequences that match a given pattern or profile (input: pattern or profile; output: sequences)
  • ScanProsite; Scans a sequence a sequence to find matches to PROSITE or SWISS-PROT and TrEMBL with a user provided pattern.
  • PatScan; Scans databases with user provided patterns.. 
  • Pmotif; Scans a nucleotide sequence or nucleotide database for matches with a given motif.
  • PSI-BLAST; Position-Specific Iterated BLAST. The BLAST algorithm generalised to use an arbitrary position-specific score matrix in place of a query sequence and associated substitution matrix.
  • Motif; Motif ALSO scans databases with user provided patterns.
Tools to generate Patterns or Profiles

Databases

Databases of protein motifs, domains and families
  • PROSITE: - Dictionary of protein sites and patterns. PROSITE is a method of determining what is the function of uncharacterized proteins translated from genomic or cDNA sequences. It consists of a database of biologically significant sites, patterns and profiles that help to reliably identify to which known family of protein (if any) a new sequence belongs. PROSITE references and references on profiles from PROSITE
  • Pfam: Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains.Version 5.0 of Pfam (January 2000) contains alignments and models for 2008 protein families, based on the Swissprot 38 and SP-TrEMBL 11 protein sequence databases.
  • SMART; database of protein domains (defined as HMM profiles) and protein families.
  • InterPro: Database of protein families defined by presence of common motifs and domains, defined in several databases such as Pfam, SMART, Prosite, and other.
  • BLOCKS:  Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. Block Searcher, Get Blocks and Block Maker are aids to detection and verification of protein sequence homology. They compare a protein or DNA sequence to a database of protein blocks, retrieve blocks, and create new blocks, respectively.
  • PRINTS: PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family; its diagnostic power is refined by iterative scanning of a SWISS-PROT/TrEMBL composite. Usually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs, full diagnostic potency deriving from the mutual context provided by motif neighbours. References
  • ProDom: The ProDom protein domain database consists of an automatic compilation of homologous domains detected in the SWISS-PROT database by the DOMAINER algorithm (Sonnhammer, E.L.L. & Kahn, D., 1994, Protein Sci. 3:482-492). It has been devised to assist with the analysis of the domain arrangement of protein. Last release of ProDom families was generated automatically using PSI-BLAST with a profile built from the seed aligments of Pfam-A 3.4 families. 
  • GeneFIND (Gene Family Identification Network Design) is an integrated database search system that combines several search/alignment tools and ProClass database to provide rapid and accurate gene family classification with enriched family information. The objectives are to improve speed and sensitivity, differentiate global and motif similarities, and provide collective information in an integrated platform that alleviates human annotation effort. It was used to identify several thousands of new ProSite members, which have been incorporated into out ProClass_Motif sub-database

 
REFERENCES
Every web page has its own reference list.
In addition, you can check:


Revised on February 2004