Abstract
The GWAS Central resource provides a toolkit for integrative access and visualization of a uniquely extensive collection of genome-wide association study data, while ensuring safe open ...access to prevent research participant identification. GWAS Central is the world's most comprehensive openly accessible repository of summary-level GWAS association information, providing over 70 million P-values for over 3800 studies investigating over 1400 unique phenotypes. The database content comprises direct submissions received from GWAS authors and consortia, in addition to actively gathered data sets from various public sources. GWAS data are discoverable from the perspective of genetic markers, genes, genome regions or phenotypes, via graphical visualizations and detailed downloadable data reports. Tested genetic markers and relevant genomic features can be visually interrogated across up to sixteen multiple association data sets in a single view using the integrated genome browser. The semantic standardization of phenotype descriptions with Medical Subject Headings and the Human Phenotype Ontology allows the precise identification of genetic variants associated with diseases, phenotypes and traits of interest. Harmonization of the phenotype descriptions used across several GWAS-related resources has extended the phenotype search capabilities to enable cross-database study discovery using a range of ontologies. GWAS Central is updated regularly and available at https://www.gwascentral.org.
The Human Genome Variation Society (HGVS) variant nomenclature is widely used to describe sequence variants in scientific publications, clinical reports, and databases. However, the HGVS ...recommendations are complex and this often results in inaccurate variant descriptions being reported. The open‐source hgvs Python package (https://github.com/biocommons/hgvs) provides a programmatic interface for parsing, manipulating, formatting, and validating of variants according to the HGVS recommendations, but does not provide a user‐friendly Web interface. We have developed a Web‐based variant validation tool, VariantValidator (https://variantvalidator.org/), which utilizes the hgvs Python package and provides additional functionality to assist users who wish to accurately describe and report sequence‐level variations that are compliant with the HGVS recommendations. VariantValidator was designed to ensure that users are guided through the intricacies of the HGVS nomenclature, for example, if the user makes a mistake, VariantValidator automatically corrects the mistake if it can, or provides helpful guidance if it cannot. In addition, VariantValidator has the facility to interconvert genomic variant descriptions in HGVS and Variant Call Format with a degree of accuracy that surpasses most competing solutions.
VariantValidator is a user‐friendly software tool designed to validate the syntax and parameters of DNA variant descriptions according to the HGVS Sequence Variant Nomenclature. VariantValidator ensures that users are guided through the intricacies of the HGVS nomenclature, e.g. if the user makes a mistake, VariantValidator automatically corrects the mistake if it can, or provides helpful guidance if it cannot. In addition, VariantValidator accurately interconverts between transcript variant descriptions and genomic variant descriptions in HGVS and Variant Call Format (VCF).
The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence-variation data, and translational ...research, but a comparable resource has not been available for common disease. Here, we have developed a concept-recognition procedure that analyzes the frequencies of HPO disease annotations as identified in over five million PubMed abstracts by employing an iterative procedure to optimize precision and recall of the identified terms. We derived disease models for 3,145 common human diseases comprising a total of 132,006 HPO annotations. The HPO now comprises over 250,000 phenotypic annotations for over 10,000 rare and common diseases and can be used for examining the phenotypic overlap among common diseases that share risk alleles, as well as between Mendelian diseases and common diseases linked by genomic location. The annotations, as well as the HPO itself, are freely available.
Provision of a molecularly confirmed diagnosis in a timely manner for children and adults with rare genetic diseases shortens their “diagnostic odyssey,” improves disease management, and fosters ...genetic counseling with respect to recurrence risks while assuring reproductive choices. In a general clinical genetics setting, the current diagnostic rate is approximately 50%, but for those who do not receive a molecular diagnosis after the initial genetics evaluation, that rate is much lower. Diagnostic success for these more challenging affected individuals depends to a large extent on progress in the discovery of genes associated with, and mechanisms underlying, rare diseases. Thus, continued research is required for moving toward a more complete catalog of disease-related genes and variants. The International Rare Diseases Research Consortium (IRDiRC) was established in 2011 to bring together researchers and organizations invested in rare disease research to develop a means of achieving molecular diagnosis for all rare diseases. Here, we review the current and future bottlenecks to gene discovery and suggest strategies for enabling progress in this regard. Each successful discovery will define potential diagnostic, preventive, and therapeutic opportunities for the corresponding rare disease, enabling precision medicine for this patient population.
ABSTRACT
There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for “the needle in a haystack” to ...uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease‐specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can “match” these cases to build evidence for causality. However, serendipity has never proven to be a reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. Three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow.
The Matchmaker Exchange (MME) includes representatives from the founding organizations and databases supporting or intending to support matchmaking services. Collaborative work has focused on both the technical aspects of data sharing, as well as policy considerations. This work has resulted in version 1.0 of a MME API, a set of requirements for qualifying as a MME service, and a user agreement for querying the MME.
The GWAS Central resource gathers and curates extensive summary-level genome-wide association study (GWAS) data and puts a range of user-friendly but powerful website tools for the comparison and ...visualisation of GWAS data at the fingertips of researchers. Through our continued efforts to harmonise and import data received from GWAS authors and consortia, and data sets actively collected from public sources, the database now contains over 72.5 million P-values for over 5000 studies testing over 7.4 million unique genetic markers investigating over 1700 unique phenotypes. Here, we describe an update to integrate this extensive data collection with mouse disease model data to support insights into the functional impact of human genetic variation. GWAS Central has expanded to include mouse gene-phenotype associations observed during mouse gene knockout screens. To allow similar cross-species phenotypes to be compared, terms from mammalian and human phenotype ontologies have been mapped. New interactive interfaces to find, correlate and view human and mouse genotype-phenotype associations are included in the website toolkit. Additionally, the integrated browser for interrogating multiple association data sets has been updated and a GA4GH Beacon API endpoint has been added for discovering variants tested in GWAS. The GWAS Central resource is accessible at https://www.gwascentral.org/.
Genotype-phenotype databases provide information about genetic variation, its consequences and its mechanisms of action for research and health care purposes. Existing databases vary greatly in type, ...areas of focus and modes of operation. Despite ever larger and more intricate datasets--made possible by advances in DNA sequencing, omics methods and phenotyping technologies--steady progress is being made towards integrating these databases rather than using them as separate entities. The consequential shift in focus from single-gene variants towards large gene panels, exomes, whole genomes and myriad observable characteristics creates new challenges and opportunities in database design, interpretation of variant pathogenicity and modes of data representation and use.
A systematic way of recording data use conditions that are based on consent permissions as found in the datasets of the main public genome archives (NCBI dbGaP and EMBL-EBI/CRG EGA).
Introduction In the age of data-driven biomedical research and clinical practice, the sharing of genomic and clinical data for health research and personalized medicine has become an important ...contributor to improved diagnosis and treatment. Repository for a Beacon network reference implementation: https://github.com/elixir-europe/beacon-network-backend “Can you provide data about males, diagnosed with Type 2 diabetes, whose age of onset is below 30 years, and who carry mutations in the APOE gene?” Depending on the data controller’s preferences over response granularity, the response options range from, “Yes, our data includes one or more” (boolean response), “Yes, we have 125” (count response), to “Yes, and here are some details about the 125 individuals that match your request” (detailed “record level” response). Start humble Beacon is flexible (Tip #2), allowing the Beacon instance to incorporate and make discoverable all or only some of the Beacon entity Model types (i.e., individuals, biosamples, genomic variations, runs–wet lab–, analysis–dry lab–, datasets, and cohorts). Handovers allows the beacon server to attach relevant information (usually in the form of URLs) to a response.