Link prediction in complex networks has recently attracted a great deal of attraction in diverse scientific domains, including social and biological sciences. Given a snapshot of a network, the goal ...is to predict links that are missing in the network or that are likely to occur in the near future. This problem has both theoretical and practical significance; it not only helps us to identify missing links in a network more efficiently by avoiding the expensive and time consuming experimental processes, but also allows us to study the evolution of a network with time. To address the problem of link prediction, numerous attempts have been made over the recent years that exploit the local and the global topological properties of the network to predict missing links in the network. In this paper, we use parametrised matrix forest index (PMFI) to predict missing links in a network. We show that, for small parameter values, this index is linked to a heat diffusion process on a graph and therefore encodes geometric properties of the network. We then develop a framework that combines the PMFI with a local similarity index to predict missing links in the network. The framework is applied to numerous networks obtained from diverse domains such as social network, biological network, and transport network. The results show that the proposed method can predict missing links with higher accuracy when compared to other state-of-the-art link prediction methods.
Ontologies are widely used throughout the biomedical domain. These ontologies formally represent the classes and relations assumed to exist within a domain. As scientific domains are deeply ...interlinked, so too are their representations. While individual ontologies can be tested for consistency and coherency using automated reasoning methods, systematically combining ontologies of multiple domains together may reveal previously hidden contradictions.
We developed a method that tests for hidden unsatisfiabilities in an ontology that arise when combined with other ontologies. For this purpose, we combined sets of ontologies and use automated reasoning to determine whether unsatisfiable classes are present. In addition, we designed and implemented a novel algorithm that can determine justifications for contradictions across extremely large and complicated ontologies, and use these justifications to semi-automatically repair ontologies by identifying a small set of axioms that, when removed, result in a consistent and coherent set of ontologies.
We tested the mutual consistency of the OBO Foundry and the OBO ontologies and find that the combined OBO Foundry gives rise to at least 636 unsatisfiable classes, while the OBO ontologies give rise to more than 300,000 unsatisfiable classes. We also applied our semi-automatic repair algorithm to each combination of OBO ontologies that resulted in unsatisfiable classes, finding that only 117 axioms could be removed to account for all cases of unsatisfiability across all OBO ontologies.
We identified a large set of hidden unsatisfiability across a broad range of biomedical ontologies, and we find that this large set of unsatisfiable classes is the result of a relatively small amount of axiomatic disagreements. Our results show that hidden unsatisfiability is a serious problem in ontology interoperability; however, our results also provide a way towards more consistent ontologies by addressing the issues we identified.
Identification of ontology concepts in clinical narrative text enables the creation of phenotype profiles that can be associated with clinical entities, such as patients or drugs. Constructing ...patient phenotype profiles using formal ontologies enables their analysis via semantic similarity, in turn enabling the use of background knowledge in clustering or classification analyses. However, traditional semantic similarity approaches collapse complex relationships between patient phenotypes into a unitary similarity scores for each pair of patients. Moreover, single scores may be based only on matching terms with the greatest information content (IC), ignoring other dimensions of patient similarity. This process necessarily leads to a loss of information in the resulting representation of patient similarity, and is especially apparent when using very large text-derived and highly multi-morbid phenotype profiles. Moreover, it renders finding a biological explanation for similarity very difficult; the black box problem. In this article, we explore the generation of multiple semantic similarity scores for patients based on different facets of their phenotypic manifestation, which we define through different sub-graphs in the Human Phenotype Ontology. We further present a new methodology for deriving sets of qualitative class descriptions for groups of entities described by ontology terms. Leveraging this strategy to obtain meaningful explanations for our semantic clusters alongside other evaluation techniques, we show that semantic clustering with ontology-derived facets enables the representation, and thus identification of, clinically relevant phenotype relationships not easily recoverable using overall clustering alone. In this way, we demonstrate the potential of faceted semantic clustering for gaining a deeper and more nuanced understanding of text-derived patient phenotypes.
•Semantic similarity is a powerful tool for gaining insight into biomedical data, but generally collapses relationships between complex entity descriptions into a single score, necessarily losing information.•To solve this problem, we develop a method for splitting phenotype profiles into semantic categories, to facilitate the availability of different features by which profiles can be compared with semantic similarity.•We evaluate this approach by performing semantic clustering on a sample of patients from MIMIC-III, comparing overall and faceted partitions.•We also develop and present a novel method for identifying explanatory variables for semantic clusters.•Using this method, we show that faceted semantic clustering facilitates recovery of clinically meaningful relationships between entities from text-derived phenotypes.
Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance 'patient-like me' ...analyses, automated coding, differential diagnosis, and outcome prediction. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or better methods in the area.
We develop a platform for reproducible benchmarking and comparison of experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from all text narrative associated with admissions in the medical information mart for intensive care (MIMIC-III).
300 semantic similarity configurations were evaluated, as well as one embedding-based approach. On average, measures that did not make use of an external information content measure performed slightly better, however the best-performing configurations when measured by area under receiver operating characteristic curve and Top Ten Accuracy used term-specificity and annotation-frequency measures.
We identified and interpreted the performance of a large number of semantic similarity configurations for the task of classifying diagnosis from text-derived phenotype profiles in one setting. We also provided a basis for further research on other settings and related tasks in the area.
The era of high-throughput techniques created big data in the medical field and research disciplines. Machine intelligence (MI) approaches can overcome critical limitations on how those large-scale ...data sets are processed, analyzed, and interpreted. The 67
th
Annual Meeting of the Radiation Research Society featured a symposium on MI approaches to highlight recent advancements in the radiation sciences and their clinical applications. This article summarizes three of those presentations regarding recent developments for metadata processing and ontological formalization, data mining for radiation outcomes in pediatric oncology, and imaging in lung cancer.
Much of the knowledge and information needed for enabling high-quality clinical research is stored in free-text format. Natural language processing (NLP) has been used to extract information from ...these sources at scale for several decades. This paper aims to present a comprehensive review of clinical NLP for the past 15 years in the UK to identify the community, depict its evolution, analyse methodologies and applications, and identify the main barriers. We collect a dataset of clinical NLP projects (n = 94; £ = 41.97 m) funded by UK funders or the European Union's funding programmes. Additionally, we extract details on 9 funders, 137 organisations, 139 persons and 431 research papers. Networks are created from timestamped data interlinking all entities, and network analysis is subsequently applied to generate insights. 431 publications are identified as part of a literature review, of which 107 are eligible for final analysis. Results show, not surprisingly, clinical NLP in the UK has increased substantially in the last 15 years: the total budget in the period of 2019-2022 was 80 times that of 2007-2010. However, the effort is required to deepen areas such as disease (sub-)phenotyping and broaden application domains. There is also a need to improve links between academia and industry and enable deployments in real-world settings for the realisation of clinical NLP's great potential in care delivery. The major barriers include research and development access to hospital data, lack of capable computational resources in the right places, the scarcity of labelled data and barriers to sharing of pretrained models.
Biomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining. For several reasons, redundancies exist in the ontology ecosystem, which ...lead to the same entities being described by several concepts in the same or similar contexts across several ontologies. While these concepts describe the same entities, they contain different sets of complementary metadata. Linking these definitions to make use of their combined metadata could lead to improved performance in ontology-based information retrieval, extraction, and analysis tasks.
We develop and present an algorithm that expands the set of labels associated with an ontology class using a combination of strict lexical matching and cross-ontology reasoner-enabled equivalency queries. Across all disease terms in the Disease Ontology, the approach found 51,362 additional labels, more than tripling the number defined by the ontology itself. Manual validation by a clinical expert on a random sampling of expanded synonyms over the Human Phenotype Ontology yielded a precision of 0.912. Furthermore, we found that annotating patient visits in MIMIC-III with an extended set of Disease Ontology labels led to semantic similarity score derived from those labels being a significantly better predictor of matching first diagnosis, with a mean average precision of 0.88 for the unexpanded set of annotations, and 0.913 for the expanded set.
Inter-ontology synonym expansion can lead to a vast increase in the scale of vocabulary available for text mining applications. While the accuracy of the extended vocabulary is not perfect, it nevertheless led to a significantly improved ontology-based characterisation of patients from text in one setting. Furthermore, where run-on error is not acceptable, the technique can be used to provide candidate synonyms which can be checked by a domain expert.
Semantic similarity is a useful approach for comparing patient phenotypes, and holds the potential of an effective method for exploiting text-derived phenotypes for differential diagnosis, text and ...document classification, and outcome prediction. While approaches for context disambiguation are commonly used in text mining applications, forming a standard component of information extraction pipelines, their effects on semantic similarity calculations have not been widely explored. In this work, we evaluate how inclusion and disclusion of negated and uncertain mentions of concepts from text-derived phenotypes affects similarity of patients, and the use of those profiles to predict diagnosis. We report on the effectiveness of these approaches and report a very small, yet significant, improvement in performance when classifying primary diagnosis over MIMIC-III patient visits.
Unstructured text created by patients represents a rich, but relatively inaccessible resource for advancing patient-centred care. This study aimed to develop an ontology for ocular immune-mediated ...inflammatory diseases (OcIMIDo), as a tool to facilitate data extraction and analysis, illustrating its application to online patient support forum data.
We developed OcIMIDo using clinical guidelines, domain expertise, and cross-references to classes from other biomedical ontologies. We developed an approach to add patient-preferred synonyms text-mined from oliviasvision.org online forum, using statistical ranking. We validated the approach with split-sampling and comparison to manual extraction. Using OcIMIDo, we then explored the frequency of OcIMIDo classes and synonyms, and their potential association with natural language sentiment expressed in each online forum post.
OcIMIDo (version 1.2) includes 661 classes, describing anatomy, clinical phenotype, disease activity status, complications, investigations, interventions and functional impacts. It contains 1661 relationships and axioms, 2851 annotations, including 1131 database cross-references, and 187 patient-preferred synonyms. To illustrate OcIMIDo's potential applications, we explored 9031 forum posts, revealing frequent mention of different clinical phenotypes, treatments, and complications. Language sentiment analysis of each post was generally positive (median 0.12, IQR 0.01–0.24). In multivariable logistic regression, the odds of a post expressing negative sentiment were significantly associated with first posts as compared to replies (OR 3.3, 95% CI 2.8 to 3.9, p < 0.001).
We report the development and validation of a new ontology for inflammatory eye diseases, which includes patient-preferred synonyms, and can be used to explore unstructured patient or physician-reported text data, with many potential applications.
Display omitted
•Here we present OcIMIDo, the first ontology of its kind in ophthalmology.•We developed the ontology using domain expertise and clinical guidelines.•Novel synonym extraction method with tf-idf to capture patient voice.•Facilitates analysis of unstructured text relating to ocular inflammatory diseases.
In mediaeval Europe, the term “commons” described the way that communities managed land that was held “in common” and provided a clear set of rules for how this “common land” was used and developed ...by, and for, the community. Similarly, as we move towards an increasingly knowledge-based society where data is the new oil, new approaches to sharing and jointly owning publicly funded research data are needed to maximise its added value. Such common management approaches will extend the data’s useful life and facilitate its reuse for a range of additional purposes, from modelling, to meta-analysis to regulatory risk assessment as examples relevant to nanosafety data. This “commons” approach to nanosafety data and nanoinformatics infrastructure provision, co-development, and maintenance is at the heart of the “
NanoCommons
” project and underpins its post-funding transition to providing a basis on which other initiatives and projects can build. The present paper summarises part of the
NanoCommons
infrastructure called the
NanoCommons
Knowledge Base. It provides interoperability for nanosafety data sources and tools, on both semantic and technical levels. The
NanoCommons
Knowledge Base connects knowledge and provides both programmatic (via an Application Programming Interface) and a user-friendly graphical interface to enable (and democratise) access to state of the art tools for nanomaterials safety prediction, NMs design for safety and sustainability, and NMs risk assessment, as well. In addition, the standards and interfaces for interoperability, e.g., file templates to contribute data to
the NanoCommons
, are described, and a snapshot of the range and breadth of nanoinformatics tools and models that have already been integrated are presented Finally, we demonstrate how the NanoCommons Knowledge Base can support users in the FAIRification of their experimental workflows and how the NanoCommons Knowledge Base itself has progressed towards richer compliance with the FAIR principles.