Creation and use of ontologies has become a mainstream activity in many disciplines, in particular, the biomedical domain. Ontology developers often disseminate information about these ontologies in ...peer-reviewed ontology description reports. There appears to be, however, a high degree of variability in the content of these reports. Often, important details are omitted such that it is difficult to gain a sufficient understanding of the ontology, its content and method of creation.
We propose the Minimum Information for Reporting an Ontology (MIRO) guidelines as a means to facilitate a higher degree of completeness and consistency between ontology documentation, including published papers, and ultimately a higher standard of report quality. A draft of the MIRO guidelines was circulated for public comment in the form of a questionnaire, and we subsequently collected 110 responses from ontology authors, developers, users and reviewers. We report on the feedback of this consultation, including comments on each guideline, and present our analysis on the relative importance of each MIRO information item. These results were used to update the MIRO guidelines, mainly by providing more detailed operational definitions of the individual items and assigning degrees of importance. Based on our revised version of MIRO, we conducted a review of 15 recently published ontology description reports from three important journals in the Semantic Web and Biomedical domain and analysed them for compliance with the MIRO guidelines. We found that only 41.38% of the information items were covered by the majority of the papers (and deemed important by the survey respondents) and a large number of important items are not covered at all, like those related to testing and versioning policies.
We believe that the community-reviewed MIRO guidelines can contribute to improving significantly the quality of ontology description reports and other documentation, in particular by increasing consistent reporting of important ontology features that are otherwise often neglected.
As a model organism,
is uniquely placed to contribute to our understanding of how brains control complex behavior. Not only does it have complex adaptive behaviors, but also a uniquely powerful ...genetic toolkit, increasingly complete dense connectomic maps of the central nervous system and a rapidly growing set of transcriptomic profiles of cell types. But this also poses a challenge: Given the massive amounts of available data, how are researchers to Find, Access, Integrate and Reuse (FAIR) relevant data in order to develop an integrated anatomical and molecular picture of circuits, inform hypothesis generation, and find reagents for experiments to test these hypotheses? The Virtual Fly Brain (virtualflybrain.org) web application & API provide a solution to this problem, using FAIR principles to integrate 3D images of neurons and brain regions, connectomics, transcriptomics and reagent expression data covering the whole CNS in both larva and adult. Users can search for neurons, neuroanatomy and reagents by name, location, or connectivity,
text search, clicking on 3D images, search-by-image, and queries by type (e.g., dopaminergic neuron) or properties (e.g., synaptic input in the antennal lobe). Returned results include cross-registered 3D images that can be explored in linked 2D and 3D browsers or downloaded under open licenses, and extensive descriptions of cell types and regions curated from the literature. These solutions are potentially extensible to cover similar atlasing and data integration challenges in vertebrates.
Integrated, up-to-date data about SARS-CoV-2 and COVID-19 is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for ...SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time-consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community vary drastically for different tasks; the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates heterogeneous biomedical data to produce knowledge graphs (KGs), and applied it to create a KG for COVID-19 response. This KG framework also can be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics.
•KG-COVID-19 is a framework for producing customized COVID-19 knowledge graphs•Our knowledge graph and framework is free, open-source, and FAIR•KG-COVID-19 integrates a wide range of COVID-19-related data in an ontology-aware way•Our KG has been applied to use cases including ML tasks, hypothesis-based querying
An effective response to the COVID-19 pandemic relies on integration of many different types of data available about SARS-CoV-2 and related viruses. KG-COVID-19 is a framework for producing knowledge graphs that can be customized for downstream applications including machine learning tasks, hypothesis-based querying, and browsable user interface to enable researchers to explore COVID-19 data and discover relationships.
An effective response to the COVID-19 pandemic relies on integration of many different types of data available about SARS-CoV-2 and related viruses. KG-COVID-19 is a framework for producing knowledge graphs that can be customized for downstream applications including machine learning tasks, hypothesis-based querying, and browsable user interface to enable researchers to explore COVID-19 data and discover relationships.
Ontologies of precisely defined, controlled vocabularies are essential to curate the results of biological experiments such that the data are machine searchable, can be computationally analyzed, and ...are interoperable across the biomedical research continuum. There is also an increasing need for methods to interrelate phenotypic data easily and accurately from experiments in animal models with human development and disease.
Here we present the Xenopus phenotype ontology (XPO) to annotate phenotypic data from experiments in Xenopus, one of the major vertebrate model organisms used to study gene function in development and disease. The XPO implements design patterns from the Unified Phenotype Ontology (uPheno), and the principles outlined by the Open Biological and Biomedical Ontologies (OBO Foundry) to maximize interoperability with other species and facilitate ongoing ontology management. Constructed in Web Ontology Language (OWL) the XPO combines the existing uPheno library of ontology design patterns with additional terms from the Xenopus Anatomy Ontology (XAO), the Phenotype and Trait Ontology (PATO) and the Gene Ontology (GO). The integration of these different ontologies into the XPO enables rich phenotypic curation, whilst the uPheno bridging axioms allows phenotypic data from Xenopus experiments to be related to phenotype data from other model organisms and human disease. Moreover, the simple post-composed uPheno design patterns facilitate ongoing XPO development as the generation of new terms and classes of terms can be substantially automated.
The XPO serves as an example of current best practices to help overcome many of the inherent challenges in harmonizing phenotype data between different species. The XPO currently consists of approximately 22,000 terms and is being used to curate phenotypes by Xenbase, the Xenopus Model Organism Knowledgebase, forming a standardized corpus of genotype-phenotype data that can be directly related to other uPheno compliant resources.
The standardized identification of biomedical entities is a cornerstone of interoperability, reuse, and data integration in the life sciences. Several registries have been developed to catalog ...resources maintaining identifiers for biomedical entities such as small molecules, proteins, cell lines, and clinical trials. However, existing registries have struggled to provide sufficient coverage and metadata standards that meet the evolving needs of modern life sciences researchers. Here, we introduce the Bioregistry, an integrative, open, community-driven metaregistry that synthesizes and substantially expands upon 23 existing registries. The Bioregistry addresses the need for a sustainable registry by leveraging public infrastructure and automation, and employing a progressive governance model centered around open code and open data to foster community contribution. The Bioregistry can be used to support the standardized annotation of data, models, ontologies, and scientific literature, thereby promoting their interoperability and reuse. The Bioregistry can be accessed through https://bioregistry.io and its source code and data are available under the MIT and CC0 Licenses at https://github.com/biopragmatics/bioregistry .
Large-scale single-cell 'omics profiling is being used to define a complete catalogue of brain cell types, something that traditional methods struggle with due to the diversity and complexity of the ...brain. But this poses a problem: How do we organise such a catalogue - providing a standard way to refer to the cell types discovered, linking their classification and properties to supporting data? Cell ontologies provide a partial solution to these problems, but no existing ontology schemas support the definition of cell types by direct reference to supporting data, classification of cell types using classifications derived directly from data, or links from cell types to marker sets along with confidence scores. Here we describe a generally applicable schema that solves these problems and its application in a semi-automated pipeline to build a data-linked extension to the Cell Ontology representing cell types in the Primary Motor Cortex of humans, mice and marmosets. The methods and resulting ontology are designed to be scalable and applicable to similar whole-brain atlases currently in preparation.
An ontology is a computable representation of the different parts of an organism and its different developmental stages as well as the relationships between them. The ontology of model organisms is ...therefore a fundamental tool for a multitude of bioinformatics and comparative analyses. The cephalochordate amphioxus is a marine animal representing the earliest diverging evolutionary lineage of chordates. Furthermore, its morphology, its anatomy and its genome can be considered as prototypes of the chordate phylum. For these reasons, amphioxus is a very important animal model for evolutionary developmental biology studies aimed at understanding the origin and diversification of vertebrates. Here, we have constructed an amphioxus ontology (AMPHX) which combines anatomical and developmental terms and includes the relationships between these terms. AMPHX will be used to annotate amphioxus gene expression patterns as well as phenotypes. We encourage the scientific community to adopt this amphioxus ontology and send recommendations for future updates and improvements.
Navigating the clinical literature to determine the optimal clinical management for rare diseases presents significant challenges. We introduce the Medical Action Ontology (MAxO), an ontology ...specifically designed to organize medical procedures, therapies, and interventions.
MAxO incorporates logical structures that link MAxO terms to numerous other ontologies within the OBO Foundry. Term development involves a blend of manual and semi-automated processes. Additionally, we have generated annotations detailing diagnostic modalities for specific phenotypic abnormalities defined by the Human Phenotype Ontology (HPO). We introduce a web application, POET, that facilitates MAxO annotations for specific medical actions for diseases using the Mondo Disease Ontology.
MAxO encompasses 1,757 terms spanning a wide range of biomedical domains, from human anatomy and investigations to the chemical and protein entities involved in biological processes. These terms annotate phenotypic features associated with specific disease (using HPO and Mondo). Presently, there are over 16,000 MAxO diagnostic annotations that target HPO terms. Through POET, we have created 413 MAxO annotations specifying treatments for 189 rare diseases.
MAxO offers a computational representation of treatments and other actions taken for the clinical management of patients. Its development is closely coupled to Mondo and HPO, broadening the scope of our computational modeling of diseases and phenotypic features. We invite the community to contribute disease annotations using POET (https://poet.jax.org/). MAxO is available under the open-source CC-BY 4.0 license (https://github.com/monarch-initiative/MAxO).
NHGRI 1U24HG011449-01A1 and NHGRI 5RM1HG010860-04.
While abnormalities related to carbohydrates (glycans) are frequent for patients with rare and undiagnosed diseases as well as in many common diseases, these glycan-related phenotypes ...(glycophenotypes) are not well represented in knowledge bases (KBs). If glycan-related diseases were more robustly represented and curated with glycophenotypes, these could be used for molecular phenotyping to help to realize the goals of precision medicine. Diagnosis of rare diseases by computational cross-species comparison of genotype-phenotype data has been facilitated by leveraging ontological representations of clinical phenotypes, using Human Phenotype Ontology (HPO), and model organism ontologies such as Mammalian Phenotype Ontology (MP) in the context of the Monarch Initiative. In this article, we discuss the importance and complexity of glycobiology and review the structure of glycan-related content from existing KBs and biological ontologies. We show how semantically structuring knowledge about the annotation of glycophenotypes could enhance disease diagnosis, and propose a solution to integrate glycophenotypes and related diseases into the Unified Phenotype Ontology (uPheno), HPO, Monarch and other KBs. We encourage the community to practice good identifier hygiene for glycans in support of semantic analysis, and clinicians to add glycomics to their diagnostic analyses of rare diseases.