WWW services for sequence analysis
Contact: Reinhard Schneider (schneider@EMBL-Heidelberg.de) and Burkhard Rost (rost@EMBL-Heidelberg.de)
'Pedestrian guide to analysing sequence databases'' (full text of book chapter introducing WWW facilities)
Table of Contents
General
Databases
Services
Software
Intro to WWW
WWW searches
General sites
Priv sites
Literature
Miscellaneous
Pathways
Genome proj
WWW searches
Search engines are programs that search through the entire WWW for certain keywords. Some engines allow to restrict the search to certain subjects.
- http://cuiwww.unige.ch/meta-index.html
Subject oriented searching (e.g. Information servers, Software, People)
- http://www.ebi.ac.uk/htbin/bwurld.pl
Searching for keywords at sites related to particular subjects (e.g. Databases, BioUtilities, Journals, Software).
- http://www.yahoo.com/Science/Biology/
Yahoo page for Science-biology, permitting subject driven searching (e.g. Anatomy, Biochemistry, Biomedical Engineering, Biotechnology)
- http://metacrawler.cs.washington.edu:8080/
MetaCrawler, general search engine
- http://webcrawler.com/
Web Crawler, general search engine
- http://altavista.digital.com/
Alta Vista, general search engine
- http://home.netscape.com/home/internet-search.html
NetSearch, general search engine
- http://minbar.cs.washington.edu:6060/
Ahoy! the Homepage Finder: search for email addresses and www homepages
- CBS: http://www.cbs.dtu.dk/
Center for Biological Sequence Analysis, Copenhagen, Denmark.
- CNB: http://gredos.cnb.uam.es/
Protein Design Group at the CNB, Madrid, Spain.
- CSC: http://www.csc.fi:80/molbio/
The Finnish EMBnet node.
- Dublin: http://biotech.bio.tcd.ie//
Trinity College, Dublin, Ireland, Bioinformatics.
- ETHZ: http://cbrg.inf.ethz.ch/
The Computational Biochemistry Server at ETHZ (Univ. Zuerich, Switzerland)
- MIPS: http://speedy.mips.biochem.mpg.de/
Martinsried Institute for Sequence Analysis, Munich, Germany.
- Univ. Cambridge: http://www.bio.cam.ac.uk/
Univ. of Cambridge, U.K., School of Biological Sciences.
The following two links are maintained privately and constitute probably the best starting point to obtain an overview about what you would want to search for in molecular biology.
- Dictionary of cell biology: http://www.mblab.gla.ac.uk/~julian/Dict.html
The Dictionary of Cell Biology was first published in 1989, and has since been translated into several languages. It is intended to provide quick access to easily-understood and cross-referenced definitions of terms frequently encountered in reading the modern biology literature. This server contains the text of the Second edition, published in April 1995, together with enhancements, hypertext links and new entries which are destined for the third edition.
- Molecular Biology Protocols: http://research.nwfsc.noaa.gov/protocols.html
Molecular Biology Protocols (Microbial Pathogenesis/Utilization Research Division, Northwest Fisheries Science Center (NMFS/NOAA), USA). Lists various protocols and collects information on techniques (e.g., DNA purification techniques, DNA transformation/library preparation, southern/northern blotting, DNA sequencing, PCR and related methods, Protein electrophoresis).
- The antibody resource page: http://www-chem.ucsd.edu/Faculty/goodman/antibody.html/abpage.html
Access to antibody resources (UCSD, San Diego, USA).
- Biobase: http://biobase.dk/cgi-bin/celis
The Danish Centre for Human Genome Research's 2-D PAGE Databases at the University of Aarhus contain data on proteins identified on various reference maps. Available are:
- Taxa: http://ucmp1.berkeley.edu/taxaform.html
Web 'lift' through any taxon, including an introduction to phylogeny and the origins of life (Berkely, USA).
- List of species in SWISS-PROT: http://expasy.hcuge.ch/cgi-bin/speclist
- Phylogeny: http://www.no.embnet.org/phylogeny.html
Evolution and Phylogeny Laboratory, Norway.
- Courses: http://www.biochem.ucl.ac.uk/bsm/dbbrowser/courses.html
Collection of bioinformatics courses (Univ College, London, England)
- BioMOO: http://bioinformatics.weizmann.ac.il/BioMOO/
BioMOO is a virtual meeting place for biologists, connected to the Globewide Network Academy. The main physical part of the BioMOO is located at the BioInformatics Unit of the Weizmann Institute of Science, Israel.
- OPAL: http://www.elsevier.nl:80/section/life/opal/doc/demos.htm
Open programs for associative learning (Elsevier, NL), an interactive system useful for teaching and learning across various disciplines of cell biology and biomedical sciences.
- BDGP:
Berkeley Drosophila Genome Project, Berkeley, U.S.A.
General
Miscellaneous
Nucleotide seq
Protein seq information
Protein structure
(note: see also 'Metabolic and other pathways')
- Malaria db: http://www.wehi.edu.au/biology/malaria/who.html
Dep. of Microbiol., Monash Univ. and The Walter and Eliza Hall Inst. of Medical Research, Australia.
- Malaria (Parasitology): http://www.wehi.edu.au/biology/malaria/sites.html
Other Malaria and Parasitology Sites, Monash Univ., Australia.
- Parasite genome db: http://www.ebi.ac.uk/parasites/parasite-genome.html
Parasite genome databases and genome research resource, EBI, U.K.
- PDD Protein Disease Db: http://www-pdd.ncifcrf.gov/PDD/GEN-docs/indexGEN.html
- GIF_DB: http://www-biol.univ-mrs.fr/~lgpd/GIF_DB/GIF_entries/GIF_DB_listing.html
Genes Interactions in the Fly DataBase, Marseille, France. A specialized database for Interactions involved in Pattern formation in Drosophila.
- CySPID: http://paella.med.yale.edu/cyspid/
The Cytoskeletal Protein Interactions Database, Yale, U.S.A.
- TBASE The Transgenic/Targeted Mutation db: http://www.gdb.org/Dan/tbase/tbase.html
- EMBL Nucleotide Sequence Database (EBI): http://www.ebi.ac.uk/ebi_docs/embl_db/ebi/topembl.html
- GenBank Nucleotide Sequence db (NCBI): http://www.ncbi.nlm.nih.gov/Web/Search/index.html
- NDB Nucleic Acid db (Tuttgers): http://ndbserver.rutgers.edu
- The Genome Sequence DataBase (NCGR): http://www.ncgr.org/gsdb/gsdb.html
- The TIGR Human cDNA Database: http://www.tigr.org/tdb/hcd/hcd.html
- Vector sequence db (Queen's Univ, Canada): http://biology.queensu.ca/~miseners/vector.html
- The Ribosomal Database Project (Univ of Illinois, Urbana, USA): http://rdpwww.life.uiuc.edu/
- Large ribosomal subunit db (Univ Antwerpen, Belgium): http://rrna.uia.ac.be/rrna/lsuform.html
- Small ribosomal subunit db (Univ Antwerpen, Belgium): http://rrna.uia.ac.be/rrna/ssuform.html
- uRNA db (Univ of Texas, Tyler, USA): http://pegasus.uthct.edu/uRNADB/uRNADB.html
- RNA modification db (University of Utah, Salt Lake City, USA): http://medstat.med.utah.edu/RNAmods/
- The molecular probe db (IST, Genova, Italy): http://www.biotech.ist.unige.it/interlab/mpdb.html
- PCR primers db (Univ Nijmegen, Netherlands): http://www.ebi.ac.uk/primers_home.html
- Codon usage db (Kazusa DNA Research Institute, Japan): http://www.dna.affrc.go.jp/~nakamura/
- DOGS (Database Of Genome Sizes, CBS, Denmark): http://www.cbs.dtu.dk/DOGS/index.html
For other database related issues, e.g., 'Carbohydrates resource', 'Species specific databases', 'Gene(s)/protein(s) specific databases/resources', and 'Dictionaries, primers, courses, nomenclature, asf.' see the WWW links of Amos Bairoch http://expasy.hcuge.ch/www/amos_www_links.html
- SWISS-PROT database of protein sequences: http://expasy.hcuge.ch/sprot/sprot-top.html
- PIR international protein sequence database: http://www.gdb.org/Dan/proteins/pir.html
- PROSITE: http://expasy.hcuge.ch/sprot/prosite.html
Dictionary of protein sites and patterns. PROSITE is a method of determining what is the function of uncharacterized proteins translated from genomic or cDNA sequences. It consists of a database of biologically significant sites, patterns and profiles that help to reliably identify to which known family of protein (if any) a new sequence belongs.
- BLOCKS: http://www.blocks.fhcrc.org
Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. Block Searcher, Get Blocks and Block Maker are aids to detection and verification of protein sequence homology. They compare a protein or DNA sequence to a database of protein blocks, retrieve blocks, and create new blocks, respectively.
- PRINTS : http://www.biochem.ucl.ac.uk/bsm/dbbrowser/PRINTS/PRINTS.html
PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family; its diagnostic power is refined by iterative scanning of OWL. Usually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs: the database thus provides a useful adjunct to PROSITE.
- MOTIFS : http://www.genome.ad.jp/SIT/MOTIF.html
A set of motif libraries and search programs (Kyoto Univ., Japan) for retrieval and analysis of protein sequence and structural motifs. The program currently available is
- ProDom: http://protein.toulouse.inra.fr/
The ProDom protein domain database consists of an automatic compilation of homologous domains detected in the SWISS-PROT database by the DOMAINER algorithm (Sonnhammer, E.L.L. & Kahn, D., 1994, Protein Sci. 3:482-492). It has been devised to assist with the analysis of the domain arrangement of proteins.
- PUU: ftp://ftp.embl-heidelberg.de/pub/databases/protein_extras/puu/domains.puu
Putative protein structural domains.
- Yeast db: http://quest7.proteome.com/YPDhome.html
YPD contains physical, functional, and genetic information for the proteins of budding yeast, Saccharomyces cerevisiae.
- Kabat db: http://immuno.bme.nwu.edu/
The Kabat database of sequences of proteins of immunological interest.
- REBASE - The Restriction Enzyme db: http://www.gdb.org/Dan/rebase/rebase.html
- EC-Enzyme classification db: http://www.gdb.org/Dan/proteins/ec-enzyme.html
- ENZYME (nomenclature db): http://expasy.hcuge.ch/sprot/enzyme.html --- ftp://expasy.hcuge.ch/databases/enzyme
- Enzyme Structures db: http://www.biochem.ucl.ac.uk/bsm/enzymes/index.html
- TBASE The Transgenic/Targeted Mutation db: http://www.gdb.org/Dan/tbase/tbase.html
- PDD Protein Disease Db: http://www-pdd.ncifcrf.gov/PDD/GEN-docs/indexGEN.html
- O-GlycBase: http://www.cbs.dtu.dk/OGLYCBASE/cbsoglycbase.html
O-GLYCBASE is a revised database of O-glycosylated proteins (CBS, Denmark).
- PDB (Brookhaven): http://pdb.pdb.bnl.gov/
The Protein Data Bank (PDB) is an archive of experimentally determined three-dimensional structures of biological macromolecules, serving a global community of researchers, educators, and students.
- BMCD: http://ibm4.carb.nist.gov:4400/bmcd/bmcd.html
The biological macromolecule crystallization db and the NASA archive for protein crystal growth data.
- BioMagResBank: http://www.bmrb.wisc.edu
Protein, peptide and nucleic acid NMR spectroscopy db.
- Klotho: http://ibc.wustl.edu/klotho
Biochemical compounds declarative db.
- DSSP: file://ftp.embl-heidelberg.de/pub/databases/dssp/
DSSP database of secondary structure assignments for proteins of known structure. Contains information about secondary structure, solvent accessibility and some contacts for all PDB proteins.
- HSSP: file://ftp.embl-heidelberg.de/pub/databases/hssp/
HSSP database of homology-derived secondary structure of proteins. It contains the alignments of all known structures against the SWISS-PROT sequence database.
- FSSP: file://ftp.embl-heidelberg.de/pub/databases/fssp/
FSSP database of fold classification based on structure-structure alignment of proteins. Contains the structural all-against all alignments for PDB.
- PDBFINDER: http://www.sander.embl-heidelberg.de/pdbfinder/
The PDBFINDER database is a database that is constructed from the PDB, DSSP and HSSP databases. Many of the fields contained in the PDBFINDER database are difficult to access from the original databases. Some information is retrieved from the original literature.
- Enzyme Structures db: http://www.biochem.ucl.ac.uk/bsm/enzymes/index.html
- SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/
Classification of protein structures into structural families and display of 3D structures.
- CATH: http://www.biochem.ucl.ac.uk/bsm/cath/CATHintro.htm
Classification of protein structures into structural families. CATH is based on both structural and sequence relationships between proteins at several levels of similarity.
- ProDom: http://protein.toulouse.inra.fr/
The ProDom protein domain database consists of an automatic compilation of homologous domains detected in the SWISS-PROT database by the DOMAINER algorithm (Sonnhammer, E.L.L. & Kahn, D., 1994, Protein Sci. 3:482-492). It has been devised to assist with the analysis of the domain arrangement of proteins.
- PUU: ftp://ftp.embl-heidelberg.de/pub/databases/protein_extras/puu/domains.puu
Putative protein structural domains.
- SWISS-3DIMAGE: http://expasy.hcuge.ch/sw3d/sw3d-top.html
High quality pictores of biological macromolecules.
- Protein Motions: http://hyper.stanford.edu/~mbg/ProtMotDB
A db of domain, loop and subunit motions
General
Database searches
Analysing nucleotide seq
Protein structure prediction
- ECACC: http://www.gdb.org/annex/ecacc/HTML/ecacc.html
European collection of animal cell cultures. The European Collection of Animal Cell Cultures is a self financed part of the Centre for Applied Microbiology and Research. The collection is supported from a combination of sources, the UK Research Councils (MRC, AFRC, SERC, NERC), the Commission of the European Communities, the World Health Organisation and revenue from sales and the provision of technical services. The Collection accepts deposits from a wide range of institutions including industry and aims to provide as comprehensive a service as possible to its users. Further information is provided on technical matters and the increased scope of back-up services from ECACC.
- HyperCLDB: http://www.biotech.ist.unige.it/cldb/indexes.html
HyperCLDB, the hypertext on cell culture availability extracted from the Cell Line Data Base of the Interlab Project.
- QUEST: http://siva.cshl.org/
The QUEST Protein Database Center is a facility for the construction and analysis of Protein Databases. The data is generated by two-dimensional (2D) electrophoresis of proteins on polyacrylamide gels. We are located at the Cold Spring Harbor Laboratory (CSHL) on Long Island, New York, and we have a computer facility where gels are analyzed and 2D gel protein databases are built. Our goal is the construction of protein databases for scientific investigations.
- Compute pI/Mw: http://expasy.hcuge.ch/ch2d/pi_tool.html
Compute pI/Mw is a tool which allows the computation of the theoretical pI (isolectric point) and Mw (molecular weight) for a list of SWISS-PROT entries or for a user entered sequence.
- K2d server: http://kal-el.ugr.es/k2d/k2d.html
Estimation of the percentages of protein secondary structure from UV circular dichroism spectra using a neural network.
- Biotech validation: http://biotech.embl-heidelberg.de:8400/
Biotech validation suite for protein structures (quality checks of protein structures). The server gives you a comprehensive check report of your protein.
- Dali server: http://www.embl-heidelberg.de/dali/dali.html
The Dali server is a network service for comparing protein structures in 3D. You submit the coordinates of a query protein structure and Dali compares them against those in the Protein Data Bank. A multiple alignment of structural neighbours is mailed back to you. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences.
- EBI (England): http://www.ebi.ac.uk/searches/searches.html
Sequence similarity searches (FASTA, BLITZ, PROSITE, BLAST, MAXHOM-PredictProtein).
- BCM (USA): http://dot.imgen.bcm.tmc.edu:9331/seq-search/protein-search.html
General protein sequence/pattern searches. Programs include fast methods (BLAST, FASTA, PROSITE) and full dynamic programming methods (FASTA, BLAST, BLITZ, MPSEARCH).
- BioSCAN: http://genome.cs.unc.edu/online.html
The BioSCAN Server allows searching, retrieving and comparing of protein and DNA sequences.
- NCSA Biology Workbench: http://biology.ncsa.uiuc.edu/BW/BW.cgi
The NCSA Biology Workbench provides a point and click interface for rapid access to biological databases and analysis tools.
- BLAST: http://www.ncbi.nlm.nih.gov/BLAST/
BLAST performs fast database searching combined with rigorous statistics for judging the significance of matches. Five BLAST programs search all combinations of query and database sequences.
- ToPLign: http://cartan.gmd.de/ToPLign.html
ToPLign implements standard pairwise and multiple alignment methods with flexible parameter handling. The analysis of alignments is supported by offering different visualisations of alignments. Furthermore, the stability of the resulting alignments can be explored.
- BOX: http://ulrec3.unil.ch/software/BOX_form.html
Pretty Printing and Shading of Multiple-Alignment files.
- Sequence Alerting System: http://swan.embl-heidelberg.de:8080/Alerting/
The sequence alerting system in its present form will search each day in several databases for news on (homologues of) "your" sequence and will inform you by email if it has detected a new relative.
- PSORT: http://psort.nibb.ac.jp/
Prediction of protein sorting signals and localisation sites in amino acid sequences.
- GRAIL: http://avalon.epm.ornl.gov/
GRAIL (Gene Recognition and Assembly Internet Link) is DNA Sequence analysis tool. The GenQuest sequence comparison server is designed for rapid and sensitive comparison of DNA and Protein sequence to existing DNA and Protein sequence databases. Full database entries of any sequence found in the course of a search are retrieved.
- GenQuest: http://www.gdb.org/Dan/gq/gq.form.htm
Running BLAST, FASTA or a full dynamic programming alignment of nucleotide sequences against Nucleotide and protein databases.
- Splice site predictions : http://www.cbs.dtu.dk/bsnn.html
The Center for Biological Sequence Analysis (CBS, Copenhagen, Denmark) offers a service for predicting intron splice sites in human and Arabidopsis thaliana DNA.
- Documented collection of prediction services: http://www.mrc-cpe.cam.ac.uk/jong/predict/pred-tools.html
Overview and links to services for predicting secondary structure, solvent accessibility, homology modelling and threading (MRC, Cambdridge, England).
- PredictProtein: http://www.embl-heidelberg.de/predictprotein/
Multiple sequence alignment (MAXHOM); prediction of secondary structure (PHDsec), solvent accessibility (PHDacc), transmembrane helices (PHDhtm), transmembrane topology (PHDtopology); and threading (PHDthreader).
- PHDacc: http://www.embl-heidelberg.de/predictprotein/
Multiple alignment-based neural network system.
Accuracy: > 75% (+/-10%, one standard deviation), higher for more reliably predicted residues. Evaluated by cross-validation on 720 unique proteins; comparisons to other methods based on identical sets.
- PHDhtm: http://www.embl-heidelberg.de/predictprotein/
Multiple alignment-based neural network system predicting the locations of transmembrane helices.
Accuracy: > 95% (+/-10%, one standard deviation), higher for more reliably predicted residues. Evaluated by cross-validation on 132 proteins; comparisons to other methods based on identical sets.
- TMAP: http://www.embl-heidelberg.de/tmap/tmap_sin.html
Single sequence-based statistical prediction of the locations of transmembrane helices.
Accuracy: > 95%. Evaluated on 28 proteins WITHOUT cross-validation.
- PHDtopology: http://www.embl-heidelberg.de/predictprotein/
Refinement of PHDhtm by dynamic programming and prediction of topology (orientation of N-term with respect to membrane).
Accuracy: for > 85% of all proteins all helices and topology are predicted correctly. Evaluated by cross-validation on 132 proteins; comparisons to other methods based on identical sets.
- TMpred: http://ulrec3.unil.ch/software/TMPRED_form.html
Single sequence-based prediction of location and topology for helical transmembrane proteins using statistics and similarity metrices.
- Signalp : http://www.cbs.dtu.dk/services/SignalP/
Neural network prediction of presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive and Gram-negative prokaryotes, and eukaryotes.
- PHDthreader: http://www.embl-heidelberg.de/predictprotein/
Prediction-based threading detecting the fold type and aligning a protein of unknown structure and a protein of known structure for low levels of sequence identity ( < 25%).
Accuracy: < 30%, i.e., less than 30% of the predicted first hits are true remote homologues. Evaluated by cross-validation on 89 unique protein structures.
- T3P2: http://www.mbi.ucla.edu/people/frsvr/frsvr.html
Prediction-based threading detecting the fold type and aligning a protein of unknown structure and a protein of known structure for low levels of sequence identity ( < 25%).