The developmental and physiological complexity of the auditory system is likely reflected in the underlying set of genes involved in auditory function. In humans, over 150 non-syndromic loci have ...been identified, and there are more than 400 human genetic syndromes with a hearing loss component. Over 100 non-syndromic hearing loss genes have been identified in mouse and human, but we remain ignorant of the full extent of the genetic landscape involved in auditory dysfunction. As part of the International Mouse Phenotyping Consortium, we undertook a hearing loss screen in a cohort of 3006 mouse knockout strains. In total, we identify 67 candidate hearing loss genes. We detect known hearing loss genes, but the vast majority, 52, of the candidate genes were novel. Our analysis reveals a large and unexplored genetic landscape involved with auditory function.The full extent of the genetic basis for hearing impairment is unknown. Here, as part of the International Mouse Phenotyping Consortium, the authors perform a hearing loss screen in 3006 mouse knockout strains and identify 52 new candidate genes for genetic hearing loss.
Motivation: The generation of large amounts of microarray data and the need to share these data bring challenges for both data management and annotation and highlights the need for standards. MIAME ...specifies the minimum information needed to describe a microarray experiment and the Microarray Gene Expression Object Model (MAGE-OM) and resulting MAGE-ML provide a mechanism to standardize data representation for data exchange, however a common terminology for data annotation is needed to support these standards. Results: Here we describe the MGED Ontology (MO) developed by the Ontology Working Group of the Microarray Gene Expression Data (MGED) Society. The MO provides terms for annotating all aspects of a microarray experiment from the design of the experiment and array layout, through to the preparation of the biological sample and the protocols used to hybridize the RNA and analyze the data. The MO was developed to provide terms for annotating experiments in line with the MIAME guidelines, i.e. to provide the semantics to describe a microarray experiment according to the concepts specified in MIAME. The MO does not attempt to incorporate terms from existing ontologies, e.g. those that deal with anatomical parts or developmental stages terms, but provides a framework to reference terms in other ontologies and therefore facilitates the use of ontologies in microarray data annotation. Availability: The MGED Ontology version.1.2.0 is available as a file in both DAML and OWL formats at . Release notes and annotation examples are provided. The MO is also provided via the NCICB's Enterprise Vocabulary System (). Contact:Stoeckrt@pcbi.upenn.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Experimental descriptions are typically stored as free text without using standardized terminology, creating challenges in comparison, reproduction and analysis. These difficulties impose limitations ...on data exchange and information retrieval.
The Ontology for Biomedical Investigations (OBI), developed as a global, cross-community effort, provides a resource that represents biomedical investigations in an explicit and integrative framework. Here we detail three real-world applications of OBI, provide detailed modeling information and explain how to use OBI.
We demonstrate how OBI can be applied to different biomedical investigations to both facilitate interpretation of the experimental process and increase the computational processing and integration within the Semantic Web. The logical definitions of the entities involved allow computers to unambiguously understand and integrate different biological experimental processes and their relevant components.
OBI is available at http://purl.obolibrary.org/obo/obi/2009-11-02/obi.owl.
Genome-wide association studies (GWASs) have enabled robust mapping of complex traits in humans. The open sharing of GWAS summary statistics (SumStats) is essential in facilitating the larger ...meta-analyses needed for increased power in resolving the genetic basis of disease. However, most GWAS SumStats are not readily accessible because of limited sharing and a lack of defined standards. With the aim of increasing the availability, quality, and utility of GWAS SumStats, the National Human Genome Research Institute-European Bioinformatics Institute (NHGRI-EBI) GWAS Catalog organized a community workshop to address the standards, infrastructure, and incentives required to promote and enable sharing. We evaluated the barriers to SumStats sharing, both technological and sociological, and developed an action plan to address those challenges and ensure that SumStats and study metadata are findable, accessible, interoperable, and reusable (FAIR). We encourage early deposition of datasets in the GWAS Catalog as the recognized central repository. We recommend standard requirements for reporting elements and formats for SumStats and accompanying metadata as guidelines for community standards and a basis for submission to the GWAS Catalog. Finally, we provide recommendations to enable, promote, and incentivize broader data sharing, standards and FAIRness in order to advance genomic medicine.
MacArthur et al. report on a GWAS Catalog community workshop on the sharing of summary statistics to ensure that these are findable, accessible, interoperable, and reusable (FAIR). They provide recommendations and guidance to incentivize broader data sharing, standards, and FAIRness in order to advance genomic medicine.
There is a huge demand on bioinformaticians to provide their biologists with user friendly and scalable software infrastructures to capture, exchange, and exploit the unprecedented amounts of new ...*omics data. We here present MOLGENIS, a generic, open source, software toolkit to quickly produce the bespoke MOLecular GENetics Information Systems needed.
The MOLGENIS toolkit provides bioinformaticians with a simple language to model biological data structures and user interfaces. At the push of a button, MOLGENIS' generator suite automatically translates these models into a feature-rich, ready-to-use web application including database, user interfaces, exchange formats, and scriptable interfaces. Each generator is a template of SQL, JAVA, R, or HTML code that would require much effort to write by hand. This 'model-driven' method ensures reuse of best practices and improves quality because the modeling language and generators are shared between all MOLGENIS applications, so that errors are found quickly and improvements are shared easily by a re-generation. A plug-in mechanism ensures that both the generator suite and generated product can be customized just as much as hand-written software.
In recent years we have successfully evaluated the MOLGENIS toolkit for the rapid prototyping of many types of biomedical applications, including next-generation sequencing, GWAS, QTL, proteomics and biobanking. Writing 500 lines of model XML typically replaces 15,000 lines of hand-written programming code, which allows for quick adaptation if the information system is not yet to the biologist's satisfaction. Each application generated with MOLGENIS comes with an optimized database back-end, user interfaces for biologists to manage and exploit their data, programming interfaces for bioinformaticians to script analysis tools in R, Java, SOAP, REST/JSON and RDF, a tab-delimited file format to ease upload and exchange of data, and detailed technical documentation. Existing databases can be quickly enhanced with MOLGENIS generated interfaces using the 'ExtractModel' procedure.
The MOLGENIS toolkit provides bioinformaticians with a simple model to quickly generate flexible web platforms for all possible genomic, molecular and phenotypic experiments with a richness of interfaces not provided by other tools. All the software and manuals are available free as LGPLv3 open source at http://www.molgenis.org.
Metabolic diseases are a worldwide problem but the underlying genetic factors and their relevance to metabolic disease remain incompletely understood. Genome-wide research is needed to characterize ...so-far unannotated mammalian metabolic genes. Here, we generate and analyze metabolic phenotypic data of 2016 knockout mouse strains under the aegis of the International Mouse Phenotyping Consortium (IMPC) and find 974 gene knockouts with strong metabolic phenotypes. 429 of those had no previous link to metabolism and 51 genes remain functionally completely unannotated. We compared human orthologues of these uncharacterized genes in five GWAS consortia and indeed 23 candidate genes are associated with metabolic disease. We further identify common regulatory elements in promoters of candidate genes. As each regulatory element is composed of several transcription factor binding sites, our data reveal an extensive metabolic phenotype-associated network of co-regulated genes. Our systematic mouse phenotype analysis thus paves the way for full functional annotation of the genome.
Genome sequencing has recently become a viable genotyping technology for use in genome-wide association studies (GWASs), offering the potential to analyze a broader range of genome-wide variation, ...including rare variants. To survey current standards, we assessed the content and quality of reporting of statistical methods, analyses, results, and datasets in 167 exome- or genome-wide-sequencing-based GWAS publications published from 2014 to 2020; 81% of publications included tests of aggregate association across multiple variants, with multiple test models frequently used. We observed a lack of standardized terms and incomplete reporting of datasets, particularly for variants analyzed in aggregate tests. We also find a lower frequency of sharing of summary statistics compared with array-based GWASs. Reporting standards and increased data sharing are required to ensure sequencing-based association study data are findable, interoperable, accessible, and reusable (FAIR). To support that, we recommend adopting the standard terminology of sequencing-based GWAS (seqGWAS). Further, we recommend that single-variant analyses be reported following the same standards and conventions as standard array-based GWASs and be shared in the GWAS Catalog. We also provide initial recommended standards for aggregate analyses metadata and summary statistics.
Display omitted
Recommendations for increasing FAIRness of sequencing-based GWASsTo be findable, we recommend standard terminology of sequencing-based GWAS (seqGWAS)To improve access and standards, the GWAS Catalog will support deposition of seqGWASTo improve utility, we recommend reporting standards for single and aggregate analyses
McMahon et al. report an analysis of the sequencing-based GWAS literature, finding a lack of standardized language and incomplete reporting, along with less-frequent sharing of summary statistics compared with that of array-based GWASs. We provide recommendations for the reporting and sharing of sequencing-based GWASs to increase FAIRness of these valuable datasets.
To facilitate sharing of Omics data, many groups of scientists have been working to establish the relevant data standards. The main components of data sharing standards are experiment description ...standards, data exchange standards, terminology standards, and experiment execution standards. Here we provide a survey of existing and emerging standards that are intended to assist the free and open exchange of large-format data.
Unambiguous cell line authentication is essential to avoid loss of association between data and cells. The risk for loss of references increases with the rapidity that new human pluripotent stem cell ...(hPSC) lines are generated, exchanged, and implemented. Ideally, a single name should be used as a generally applied reference for each cell line to access and unify cell-related information across publications, cell banks, cell registries, and databases and to ensure scientific reproducibility. We discuss the needs and requirements for such a unique identifier and implement a standard nomenclature for hPSCs, which can be automatically generated and registered by the human pluripotent stem cell registry (hPSCreg). To avoid ambiguities in PSC-line referencing, we strongly urge publishers to demand registration and use of the standard name when publishing research based on hPSC lines.
In this article Kurtz, Seltmann and colleagues propose a standard nomenclature and registry for human pluripotent stem cells. The nomenclature is based on fixed name elements and provides a unique identifier for PSC lines suitable for referencing over diverse data sources, registries, publications, and cell banks. Rigorous application is needed to reduce risks from misidentification and false referencing of lines and data.