| |
General NLP, Information Retrieval (IR), IE, Text Mining and ML-related tools:
-
FreeLing:
Open source language analyzer, includes morphological analysis, shallow parser and pos tagger.
-
VIEW:
Variation in English Words and Phrases, tool to compare semantically-related words and phrases in the British National Corpus.
-
OAK System:
English analyzer, which consists of a sentence spliter, a tokenizer, a POStagger, a stemmer, a chunker, a Naned Entity (NE) tagger, a dependency analyzer, a parser, a function tagger and a regularizer.
-
TreeTagger:
Language independent part-of-speech tagger.
-
SVM_light:
Support Vector Machines (SVMs) implementation in C.
-
Stanford Lexical Parser:
Probabilistic natural language parser.
-
TIGERSearch:
Tools for linguistic text exploration.
-
NLTK:
Natural Language Toolkit, python library for natural language processing.
-
GATE:
General Architecture for Text Engineering, Natural Language Proccesing system.
-
Anaphora resolution tool
Prolog tool for anaphora resolution.
-
GuiTAR
General tool for anaphora resolution.
-
JavaRAP
Java implementation of the classic Resolution of Anaphora Procedure (RAP) .
-
Lemur Toolkit
for Language Modeling and Information Retrieval.
-
Zettair
search engine and tool to build inverted file index.
-
SATZ
adaptive Sentence Boundary Detector written in C, neural network based.
-
Ngrams
n-gram analysis tool written in Perl.
-
Rubryx
text classification program (pattern classification of web sites), for Windows.
-
SEFT
Search Engine For Text, return relevant text windows for a given set of query terms.
-
Bow
Toolkit written in C for Statistical Language Modeling, Text Retrieval, Classification and Clustering.
-
Approximate String Matching
code of string matching programs.
-
Strmat
Set of C programs of string matching and pattern discovery algorithms.
-
FCLUSTER
Program for fuzzy cluster analysis.
-
LNKnet
Program for pattern Classification using a variety of techniques such as neural networks, statistical, and machine learning algorithms.
-
TextSTAT
Program for basic text analysis implemented in python.
-
Suffix sort
Program for suffix sorting written in C.
-
Alembic
Workbench for corpus analysis and domain specific tagging.
-
Quirk
Toolkit for terminology extraction and management.
-
Nice stemmer
Stemmer which integrates different stemming algorithms such as an simple stemmer, Porter, Krovetz and Combo Stemmer.
-
TnT
Statistical Part-of-Speech Tagger.
-
C. Manning list*
useful list of NLP resources by Christopher Manning.
-
SenseClusters
package (Perl) for clustering similar contexts together using unsupervised knowledge-lean methods.
-
CCG tools
tools developed by the Cognitive Computation Group at the University of Illinois, include: verb tense changer, sentence segmentation, word splitter,
shallow parser, HTML tag stripper tools.
-
LingPipe
suite of Java tools designed to perform linguistic analysis on natural language data (e.g. a heuristic within-document coreference resolution engine, general chunking,
text classification, clustering).
If you know of additional NLP, IR, text processing and classification tools, packages, scripts and programs of interest for biomedical text mining, please be so kind and contact me: martink@cnb.uam.es.
This way you help to improve completeness of this list of text mining resources for biology and biomedicine.
HOME
|
|