Medicago Genome Initiative (MGI)

Compilation Paper

Categories List

Alphabetical List

Search Summary Papers

Medicago Genome Initiative (MGI)

https://xgi.ncgr.org/mgi

Waugh, M.¹, Anderson, W.¹, Bell, C.², Inman, J.¹, Schilkey, F.¹, Sullivan, J.¹, May, G.³

¹The National Center for Genome Resources 2935 Rodeo Park Drive East Santa Fe, NM 87505, USA
²EmerGen Inc. 390 Wakara Way Salt Lake City, UT 84108, USA
³Plant Biology Division The Samuel Roberts Noble Foundation 2510 Sam Noble Parkway Ardmore, OK 73402, USA

Contact gdmay@noble.org

Database Description

The Medicago Genome Initiative (MGI) is an EST sequence database and analysis system that supports EST sequencing at the Noble Foundation Center for Medicago Genome Research (http://www.noble.org/medicago). Medicago truncatula (also known as ìbarrel medicî because of the shape of its seed pods) is a forage and model legume that is a close relative of alfalfa and soybean. With more than 18,000 types of legumes belonging to the pea family (Leguminosae), these plans are second only to grasses in economic importance. MGI was first reported in the Nucleic Acids Research 2001 Database Issue (1), and featured a prototype database, interface and analysis pipeline. We have since developed an entirely new system that retains the advantages of the prototype, with improvements that make it more portable, modular, flexible, interactive and reusable (2). The data model is designed around the concept of an analysis operation (which may run a third-party sequence analysis tool) whose input and output consists of sets of sequences (zero, one or many sequences). This permits analysis methods that use individual (e.g. similarity search) or multiple (e.g. EST clustering) sequences to interact with the same generalized relational database structure. It also allows for the flexible addition of sequence analysis methods, and the storage and analysis of genomic DNA sequences in the same schema. The analysis pipeline is run automatically upon receipt of new sequences and can be configured to perform any series of available operations. The current suite of operations include: Import; Vector Screen; Quality Control; BLASTN search to identify non-mRNA contamination; clustering, multiple sequence alignment and extraction of a consensus; BLASTX versus a protein database; and Blocks+ (protein motif) search. Annotation is automated by linking high-scoring BLAST and Blocks+ hits to their cognate entries in the Gene Ontology database (http://geneontology.org). Users view, query and manipulate their data via a WWW browser through a completely redesigned interface running on a secure server. All analysis operations are performed on consensus sequences (gene sequences) resulting from the clustering and assembly operation, rather than on individual ESTs. MGI now incorporates all publicly available M. truncatula data available from Genbank combined with public Noble data in clustering and analysis runs. Typically the data is refreshed, including a complete reanalysis with all available new data, four times per year. As of September 2001, MGI contained over 95,000 sequences of which the 65,000 GenBank ESTs grouped into 8,843 clusters and 11,279 singletons resulting in 20,122 total analyzed consensus sequences. Clusters ranged in membership from two ESTs (3585) to 256 ESTs (one). A publicly viewable version of MGI has been deployed (https://xgi.ncgr.org/mgi) which can be accessed by following the login instructions on the main page.

Recent Developments

In addition to new data from the Noble Foundation and the inclusion of all publicly available M. truncatula data from GenBank, the entire database and analysis system has been redesigned to present a gene-centric view of ESTs. The new interface improvements include keyword searches, query restriction by library and sequence type, a multiple sequence alignment viewer and a features and annotation viewer. These additions, coupled with automated assignment of GO annotations have resulted in a vastly improved information resource for model legume research.

Acknowledgements

The Samuel Roberts Noble Foundation supported this work.

REFERENCES

Bell,C.J., Dixon,R.A., Farmer,A.D., Flores,R., Inman,J., Gonzales,R.A., Harrison,M.J., Paiva,N.L., Scott,A.D., Weller,J.W. and May,G.D. The Medicago Genome Initiative: a model legume database. Nucleic Acids Res., 29, 114-117.
Inman,J.T., Flores,H.R., May,G.D., Weller,J.W. and Bell,C.J. A high-throughput distributed DNA sequence analysis and database system. IBM Systems J. 40, 464-486.

Category Genomic Databases