Classification, Ontology, and Precision Medicine Haendel, Melissa A; Chute, Christopher G; Robinson, Peter N
New England journal of medicine/The New England journal of medicine,
10/2018, Volume:
379, Issue:
15
Journal Article
Peer reviewed
Open access
Data-organizing methods have been in place for centuries, but very large data sets have come into being relatively recently. The authors describe terminologies, ontologies, and the changes needed to ...permit analyses of “big data” that might better serve medical decision making.
The International Classification of Diseases (ICD) has long been the main basis for comparability of statistics on causes of mortality and morbidity between places and over time. This paper provides ...an overview of the recently completed 11th revision of the ICD, focusing on the main innovations and their implications.
Changes in content reflect knowledge and perspectives on diseases and their causes that have emerged since ICD-10 was developed about 30 years ago. Changes in design and structure reflect the arrival of the networked digital era, for which ICD-11 has been prepared. ICD-11's information framework comprises a semantic knowledge base (the Foundation), a biomedical ontology linked to the Foundation and classifications derived from the Foundation. ICD-11 for Mortality and Morbidity Statistics (ICD-11-MMS) is the primary derived classification and the main successor to ICD-10. Innovations enabled by the new architecture include an online coding tool (replacing the index and providing additional functions), an application program interface to enable remote access to ICD-11 content and services, enhanced capability to capture and combine clinically relevant characteristics of cases and integrated support for multiple languages.
ICD-11 was adopted by the World Health Assembly in May 2019. Transition to implementation is in progress. ICD-11 can be accessed at icd.who.int.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The eMERGE (electronic MEdical Records and GEnomics) Network is an NHGRI-supported consortium of five institutions to explore the utility of DNA repositories coupled to Electronic Medical Record ...(EMR) systems for advancing discovery in genome science. eMERGE also includes a special emphasis on the ethical, legal and social issues related to these endeavors.
The five sites are supported by an Administrative Coordinating Center. Setting of network goals is initiated by working groups: (1) Genomics, (2) Informatics, and (3) Consent & Community Consultation, which also includes active participation by investigators outside the eMERGE funded sites, and (4) Return of Results Oversight Committee. The Steering Committee, comprised of site PIs and representatives and NHGRI staff, meet three times per year, once per year with the External Scientific Panel.
The primary site-specific phenotypes for which samples have undergone genome-wide association study (GWAS) genotyping are cataract and HDL, dementia, electrocardiographic QRS duration, peripheral arterial disease, and type 2 diabetes. A GWAS is also being undertaken for resistant hypertension in ≈ 2,000 additional samples identified across the network sites, to be added to data available for samples already genotyped. Funded by ARRA supplements, secondary phenotypes have been added at all sites to leverage the genotyping data, and hypothyroidism is being analyzed as a cross-network phenotype. Results are being posted in dbGaP. Other key eMERGE activities include evaluation of the issues associated with cross-site deployment of common algorithms to identify cases and controls in EMRs, data privacy of genomic and clinically-derived data, developing approaches for large-scale meta-analysis of GWAS data across five sites, and a community consultation and consent initiative at each site.
Plans are underway to expand the network in diversity of populations and incorporation of GWAS findings into clinical care.
By combining advanced clinical informatics, genome science, and community consultation, eMERGE represents a first step in the development of data-driven approaches to incorporate genomic information into routine healthcare delivery.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the ...clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies-the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.
Long COVID, or complications arising from COVID-19 weeks after infection, has become a central concern for public health experts. The United States National Institutes of Health founded the RECOVER ...initiative to better understand long COVID. We used electronic health records available through the National COVID Cohort Collaborative to characterize the association between SARS-CoV-2 vaccination and long COVID diagnosis. Among patients with a COVID-19 infection between August 1, 2021 and January 31, 2022, we defined two cohorts using distinct definitions of long COVID-a clinical diagnosis (n = 47,404) or a previously described computational phenotype (n = 198,514)-to compare unvaccinated individuals to those with a complete vaccine series prior to infection. Evidence of long COVID was monitored through June or July of 2022, depending on patients' data availability. We found that vaccination was consistently associated with lower odds and rates of long COVID clinical diagnosis and high-confidence computationally derived diagnosis after adjusting for sex, demographics, and medical history.
To determine the respective associations of premorbid glucagon-like peptide-1 receptor agonist (GLP1-RA) and sodium-glucose cotransporter 2 inhibitor (SGLT2i) use, compared with premorbid dipeptidyl ...peptidase 4 inhibitor (DPP4i) use, with severity of outcomes in the setting of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection.
We analyzed observational data from SARS-CoV-2-positive adults in the National COVID Cohort Collaborative (N3C), a multicenter, longitudinal U.S. cohort (January 2018-February 2021), with a prescription for GLP1-RA, SGLT2i, or DPP4i within 24 months of positive SARS-CoV-2 PCR test. The primary outcome was 60-day mortality, measured from positive SARS-CoV-2 test date. Secondary outcomes were total mortality during the observation period and emergency room visits, hospitalization, and mechanical ventilation within 14 days. Associations were quantified with odds ratios (ORs) estimated with targeted maximum likelihood estimation using a super learner approach, accounting for baseline characteristics.
The study included 12,446 individuals (53.4% female, 62.5% White, mean ± SD age 58.6 ± 13.1 years). The 60-day mortality was 3.11% (387 of 12,446), with 2.06% (138 of 6,692) for GLP1-RA use, 2.32% (85 of 3,665) for SGLT2i use, and 5.67% (199 of 3,511) for DPP4i use. Both GLP1-RA and SGLT2i use were associated with lower 60-day mortality compared with DPP4i use (OR 0.54 95% CI 0.37-0.80 and 0.66 0.50-0.86, respectively). Use of both medications was also associated with decreased total mortality, emergency room visits, and hospitalizations.
Among SARS-CoV-2-positive adults, premorbid GLP1-RA and SGLT2i use, compared with DPP4i use, was associated with lower odds of mortality and other adverse outcomes, although DPP4i users were older and generally sicker.
Incorporating expert knowledge at the time machine learning models are trained holds promise for producing models that are easier to interpret. The main objectives of this study were to use a feature ...engineering approach to incorporate clinical expert knowledge prior to applying machine learning techniques, and to assess the impact of the approach on model complexity and performance. Four machine learning models were trained to predict mortality with a severe asthma case study. Experiments to select fewer input features based on a discriminative score showed low to moderate precision for discovering clinically meaningful triplets, indicating that discriminative score alone cannot replace clinical input. When compared to baseline machine learning models, we found a decrease in model complexity with use of fewer features informed by discriminative score and filtering of laboratory features with clinical input. We also found a small difference in performance for the mortality prediction task when comparing baseline ML models to models that used filtered features. Encoding demographic and triplet information in ML models with filtered features appeared to show performance improvements from the baseline. These findings indicated that the use of filtered features may reduce model complexity, and with little impact on performance.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the ...results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10⁻⁶ (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
IMPORTANCE: Persons with immune dysfunction have a higher risk for severe COVID-19 outcomes. However, these patients were largely excluded from SARS-CoV-2 vaccine clinical trials, creating a large ...evidence gap. OBJECTIVE: To identify the incidence rate and incidence rate ratio (IRR) for COVID-19 breakthrough infection after SARS-CoV-2 vaccination among persons with or without immune dysfunction. DESIGN, SETTING, AND PARTICIPANTS: This retrospective cohort study analyzed data from the National COVID Cohort Collaborative (N3C), a partnership that developed a secure, centralized electronic medical record–based repository of COVID-19 clinical data from academic medical centers across the US. Persons who received at least 1 dose of a SARS-CoV-2 vaccine between December 10, 2020, and September 16, 2021, were included in the sample. MAIN OUTCOMES AND MEASURES: Vaccination, COVID-19 diagnosis, immune dysfunction diagnoses (ie, HIV infection, multiple sclerosis, rheumatoid arthritis, solid organ transplant, and bone marrow transplantation), other comorbid conditions, and demographic data were accessed through the N3C Data Enclave. Breakthrough infection was defined as a COVID-19 infection that was contracted on or after the 14th day of vaccination, and the risk after full or partial vaccination was assessed for patients with or without immune dysfunction using Poisson regression with robust SEs. Poisson regression models were controlled for a study period (before or after pre– or post–Delta variant June 20, 2021), full vaccination status, COVID-19 infection before vaccination, demographic characteristics, geographic location, and comorbidity burden. RESULTS: A total of 664 722 patients in the N3C sample were included. These patients had a median (IQR) age of 51 (34-66) years and were predominantly women (n = 378 307 56.9%). Overall, the incidence rate for COVID-19 breakthrough infection was 5.0 per 1000 person-months among fully vaccinated persons but was higher after the Delta variant became the dominant SARS-CoV-2 strain (incidence rate before vs after June 20, 2021, 2.2 95% CI, 2.2-2.2 vs 7.3 95% CI, 7.3-7.4 per 1000 person-months). Compared with partial vaccination, full vaccination was associated with a 28% reduced risk for breakthrough infection (adjusted IRR AIRR, 0.72; 95% CI, 0.68-0.76). People with a breakthrough infection after full vaccination were more likely to be older and women. People with HIV infection (AIRR, 1.33; 95% CI, 1.18-1.49), rheumatoid arthritis (AIRR, 1.20; 95% CI, 1.09-1.32), and solid organ transplant (AIRR, 2.16; 95% CI, 1.96-2.38) had a higher rate of breakthrough infection. CONCLUSIONS AND RELEVANCE: This cohort study found that full vaccination was associated with reduced risk of COVID-19 breakthrough infection, regardless of the immune status of patients. Despite full vaccination, persons with immune dysfunction had substantially higher risk for COVID-19 breakthrough infection than those without such a condition. For persons with immune dysfunction, continued use of nonpharmaceutical interventions (eg, mask wearing) and alternative vaccine strategies (eg, additional doses or immunogenicity testing) are recommended even after full vaccination.
Naming a newly discovered disease is a difficult process; in the context of the COVID-19 pandemic and the existence of post-acute sequelae of SARS-CoV-2 infection (PASC), which includes long COVID, ...it has proven especially challenging. Disease definitions and assignment of a diagnosis code are often asynchronous and iterative. The clinical definition and our understanding of the underlying mechanisms of long COVID are still in flux, and the deployment of an ICD-10-CM code for long COVID in the USA took nearly 2 years after patients had begun to describe their condition. Here, we leverage the largest publicly available HIPAA-limited dataset about patients with COVID-19 in the US to examine the heterogeneity of adoption and use of U09.9, the ICD-10-CM code for "Post COVID-19 condition, unspecified."
We undertook a number of analyses to characterize the N3C population with a U09.9 diagnosis code (n = 33,782), including assessing person-level demographics and a number of area-level social determinants of health; diagnoses commonly co-occurring with U09.9, clustered using the Louvain algorithm; and quantifying medications and procedures recorded within 60 days of U09.9 diagnosis. We stratified all analyses by age group in order to discern differing patterns of care across the lifespan.
We established the diagnoses most commonly co-occurring with U09.9 and algorithmically clustered them into four major categories: cardiopulmonary, neurological, gastrointestinal, and comorbid conditions. Importantly, we discovered that the population of patients diagnosed with U09.9 is demographically skewed toward female, White, non-Hispanic individuals, as well as individuals living in areas with low poverty and low unemployment. Our results also include a characterization of common procedures and medications associated with U09.9-coded patients.
This work offers insight into potential subtypes and current practice patterns around long COVID and speaks to the existence of disparities in the diagnosis of patients with long COVID. This latter finding in particular requires further research and urgent remediation.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK