Waugh, M.E.1, Tyler, B.M.2, Mitchell, T.3, Houfek, T.D.3, Dean, R.A.3, Anderson, W.T.1, Inman, J.T.1, Schilkey, F.D.1, Sullivan, J.P.1, Bell, C.J.4
1The National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, USA
2Department of Plant Pathology, University of California, Davis, USA
3Fungal Genomics Laboratory, North Carolina State University, Raleigh, NC27695-7251, USA
4EmerGen Inc., 390 Wakara Way, Salt Lake City, UT, 84108 USA
The Phytophthora Genome Consortium (PGC) is a distributed sequencing initiative whose goal is to better understand the mechanisms of infection and resistance between host-pathogen pairs, specifically as they relate to the genus Phytophthora. There are approximately 60 species of Phytophthora, all of which are serious plant pathogens. The PGC database contains sequences from P. sojae, a soybean pathogen and P. infestans, a pathogen of potato and tomato. The PGC database is hosted by the National Center for Genome Resources, which facilitates PGC by providing automated analysis services and database access utilities. The PGC is an outgrowth of the Phytophthora Genome Initiative (PGI, 1), a pilot project begun at the NCGR in 1997 and which was subsumed by the PGC database and retired in 2001. PGC contains all of the original data from PGI (2,3) plus, as of September 2001, approximately 9000 sequences from 4 new libraries. The underlying analysis and database system, XGI, has been completely redeveloped as a portable, flexible system for automated analysis and annotation of both genomic and expressed sequence data. The system has an improved method to handle sequence quality control including vector and artifact screening and low quality read trimming. Clustering, assembly and consensus prediction are now a central component of the system, with all downstream analysis performed on gene sequences rather than on individual ESTs, including similarity and motif searching. Clustering is performed using Phrap, allowing ESTs from closely related gene family members to be distinguished, while correctly clustering sequences with sequencing errors. For example, elicitin gene family members with as few as 1 amino acid substitutions were correctly distinguished. The system uses a novel post-analysis method of assigning Gene Ontology (www.geneontology.org) annotations to predicted features to assist in putative identification. Access to the data is through the Web using a standard browser connecting to a secure server. The GUI has been completely redesigned and has been developed using AxKit, which converts the results of database queries into XML that is interpreted and displayed by the Perl-embedded stylesheets specified in the header tags. This enables rapid changes to the look, feel and functionality of the GUI with minimal effort. The PGC homepage can be accessed at https://xgi.ncgr.org/pgc and is available to the public for free by following the instructions for logging in.
The PGI database has been subsumed by the PGC database, to which considerable new data has been added. All data from within each organism has been run through an entirely new analysis system resulting in a gene-centric view of ESTs. The PGC database has a new interface providing keyword searches, query restriction by library and sequence type, a multiple sequence alignment viewer and graphical views of features and annotation. The analysis process has been augmented with the automated assignment of GO annotations and has resulted in an excellent resource for Phytophthora data.
This work was supported by USDA IFAFS grant number 00-52100-9684.
Category Genomic Databases