Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

ALFRED

http://alfred.med.yale.edu/alfred/index.asp

Osier, M.V.1, Cheung, K.-H.2, Druskin, L.2, Pakstis, A.J.1, Miller, P.L.2, Kidd, J.R.1, Kidd, K.K.1

1Department of Genetics, Yale University School of Medicine, 333 Cedar Street, PO Box 208005, New Haven, CT 06520-8005, USA
2Center for Medical Informatics, Yale University School of Medicine, 333 Cedar Street, PO Box 208009, New Haven, CT 06520-8009, USA

Contact   kidd@biomed.med.yale.edu


Database Description

ALFRED (the ALelle FREquency Database) is designed to store and disseminate frequencies of alleles at human polymorphic sites for multiple populations, primarily for the population genetics and molecular anthropology communities. The focus is primarily on samples from anthropologically defined populations but does include some samples from broader population groupings such as "Mixed Europeans", "African Americans", and "Han Chinese". Currently ALFRED has information on over 220 polymorphic sites for more than 70 populations. Since our previous publication we have focussed on increasing the quantity and quality of data, and adding and improving tools to make the data more accessible and comprehensible to the end user. ALFRED is accessible from the Kidd Lab home page (http://info.med.yale.edu/genetics/kkidd) or from ALFRED directly (http://alfred.med.yale.edu/alfred/index.asp).

Recent Developments

Expansion of Data Since our previous description of ALFRED, new funding from NSF has enabled us to begin a systematic expansion of data. We have started defining the approach that will be used to incorporate data from the literature into ALFRED while entering some test data from real literature sources. However, while many new records will be added in the coming months, most recent effort has been toward refining definitions of the data already within ALFRED. For example, we are beginning to systematically make links between ALFRED and the databases dbSNP (http://www.ncbi.nlm.nih.gov/SNP/) and HGBASE (http://hgbase.cgr.ki.se) to connect the frequency data (within ALFRED) to the molecular definitions of polymorphic sites (within dbSNP and HGBASE). In addition, the plan to migrate data to a more powerful database engine (Oracle) is underway.
Improved Dynamic Graphic Tool The previous tool to graph allele frequencies for a single site graphed the first ten alleles that were identified by the database query and added all other allele frequencies into a "residual" bar for the population. For multiallelic loci (such as STRPs and haplotype systems) this frequently resulted in large residual bars since alleles common in the first populations retrieved in a search were often not the common alleles in subsequently identified populations. The new graphing method involves two steps. In the first step, the average allele frequencies are determined for the set of populations being considered. The ten alleles with the highest average frequencies are then graphed in the second step. This approach minimizes the size of the residual bars across all populations and makes the graph generally more informative than the previous approach.
Estimated Heterozygosity and a Graphing Tool One measure of the statistical informativeness of a polymorphic site is the estimated population heterozygosity for that site. This measure is useful to researchers when choosing which polymorphic site(s) to use for a study. Instead of computing this statistical estimate from allele frequency data on-the-fly, we precompute and store the values in the database. In addition, we have written tools to graph the estimated heterozygosities. For a given site (polymorphism) one can choose a graph of the heterozygosities in all populations for which data exist. Alternatively, for a single selected population sample one can choose a graph of the heterozygosities of all polymorphisms typed on that sample. These graphs are available from the detailed information pages for both sites and populations. In the future we plan to add other summaries of estimated site heterozygosities.
New Frequency Tables for Sites In order to provide the user with greater flexibility in data visualization we are contemplating new formats for presenting allele frequency data. One of these new formats has population samples as rows and alleles as columns; each cell within the table is the corresponding allele frequency. This format more closely corresponds to the way data are typically presented in journal articles. It also makes comparisons of the frequencies of individual alleles across populations easier than the standard frequency output format that lists one allele frequency per line. For the moment, these tables are accessible from the detailed information page for individual polymorphic sites (such as http://alfred.med.yale.edu/alfred/allelepopcross.asp?siteuid=SI000058N). In the future, we hope to give the option of this and other different table formats, including a user-defined format, for specific user-defined searches for sets of populations and loci.

Acknowledgements

Initial funding for ALFRED was provided by NSF grant SBR-9632509 and USPHS grants P01GM57672, R01AA09379, and T15LM07056. Ongoing funding of ALFRED is provided by NSF grant BCS0096588.

Category   Mutation Databases

Go to the abstract in the NAR 2001 Database Issue.

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers