Gene Ontology (GO) enrichment analysis is ubiquitously used for interpreting high throughput molecular data and generating hypotheses about underlying biological phenomena of experiments. However, ...the two building blocks of this analysis - the ontology and the annotations - evolve rapidly. We used gene signatures derived from 104 disease analyses to systematically evaluate how enrichment analysis results were affected by evolution of the GO over a decade. We found low consistency between enrichment analyses results obtained with early and more recent GO versions. Furthermore, there continues to be a strong annotation bias in the GO annotations where 58% of the annotations are for 16% of the human genes. Our analysis suggests that GO evolution may have affected the interpretation and possibly reproducibility of experiments over time. Hence, researchers must exercise caution when interpreting GO enrichment analyses and should reexamine previous analyses with the most recent GO version.
In a report in this issue, Denny et al. sought to validate PheWAS by comparing the associations it discovers to those identified by genome-wide association studies (GWAS). Notably, they find that a ...single PheWAS replicates 66% of the associations produced by multiple GWAS studies that were sufficiently powered.
Depression is a prevalent disorder difficult to diagnose and treat. In particular, depressed patients exhibit largely unpredictable responses to treatment. Toward the goal of personalizing treatment ...for depression, we develop and evaluate computational models that use electronic health record (EHR) data for predicting the diagnosis and severity of depression, and response to treatment.
We develop regression-based models for predicting depression, its severity, and response to treatment from EHR data, using structured diagnosis and medication codes as well as free-text clinical reports. We used two datasets: 35,000 patients (5000 depressed) from the Palo Alto Medical Foundation and 5651 patients treated for depression from the Group Health Research Institute.
Our models are able to predict a future diagnosis of depression up to 12 months in advance (area under the receiver operating characteristic curve (AUC) 0.70-0.80). We can differentiate patients with severe baseline depression from those with minimal or mild baseline depression (AUC 0.72). Baseline depression severity was the strongest predictor of treatment response for medication and psychotherapy.
It is possible to use EHR data to predict a diagnosis of depression up to 12 months in advance and to differentiate between extreme baseline levels of depression. The models use commonly available data on diagnosis, medication, and clinical progress notes, making them easily portable. The ability to automatically determine severity can facilitate assembly of large patient cohorts with similar severity from multiple sites, which may enable elucidation of the moderators of treatment response in the future.
Display omitted
•ML models are often one small component of larger care delivery workflows.•Capacity constraints of these workflows can distort their models’ realized impacts.•Traditional ML ...evaluation metrics like AUROC ignore these external factors.•We propose “usefulness assessments” as a comprehensive way to evaluate ML models.•We develop APLUS as a general framework for conducting usefulness assessments.
Despite the creation of thousands of machine learning (ML) models, the promise of improving patient care with ML remains largely unrealized. Adoption into clinical practice is lagging, in large part due to disconnects between how ML practitioners evaluate models and what is required for their successful integration into care delivery. Models are just one component of care delivery workflows whose constraints determine clinicians’ abilities to act on models’ outputs. However, methods to evaluate the usefulness of models in the context of their corresponding workflows are currently limited. To bridge this gap we developed APLUS, a reusable framework for quantitatively assessing via simulation the utility gained from integrating a model into a clinical workflow. We describe the APLUS simulation engine and workflow specification language, and apply it to evaluate a novel ML-based screening pathway for detecting peripheral artery disease at Stanford Health Care.
Abstract
Objective
Responding to the COVID-19 pandemic requires accurate forecasting of health system capacity requirements using readily available inputs. We examined whether testing and ...hospitalization data could help quantify the anticipated burden on the health system given shelter-in-place (SIP) order.
Materials and Methods
16,103 SARS-CoV-2 RT-PCR tests were performed on 15,807 patients at Stanford facilities between March 2 and April 11, 2020. We analyzed the fraction of tested patients that were confirmed positive for COVID-19, the fraction of those needing hospitalization, and the fraction requiring ICU admission over the 40 days between March 2nd and April 11th 2020.
Results
We find a marked slowdown in the hospitalization rate within ten days of SIP even as cases continued to rise. We also find a shift towards younger patients in the age distribution of those testing positive for COVID-19 over the four weeks of SIP. The impact of this shift is a divergence between increasing positive case confirmations and slowing new hospitalizations, both of which affects the demand on health systems.
Conclusion
Without using local hospitalization rates and the age distribution of positive patients, current models are likely to overestimate the resource burden of COVID-19. It is imperative that health systems start using these data to quantify effects of SIP and aid reopening planning.
There is an urgent need for biomarkers to better stratify patients with idiopathic pulmonary fibrosis by risk for lung transplantation allocation who have the same clinical presentation. We aimed to ...investigate whether a specific immune cell type from patients with idiopathic pulmonary fibrosis could identify those at higher risk of poor outcomes. We then sought to validate our findings using cytometry and electronic health records.
We first did a discovery analysis with transcriptome data from the Gene Expression Omnibus at the National Center for Biotechnology Information for 120 peripheral blood mononuclear cell (PBMC) samples of patients with idiopathic pulmonary fibrosis. We estimated percentages of 13 immune cell types using statistical deconvolution, and investigated the association of these cell types with transplant-free survival. We validated these results using PBMC samples from patients with idiopathic pulmonary fibrosis in two independent cohorts (COMET and Yale). COMET profiled monocyte counts in 45 patients with idiopathic pulmonary fibrosis from March 12, 2010, to March 10, 2011, using flow cytometry; we tested if increased monocyte count was associated with the primary outcome of disease progression. In the Yale cohort, 15 patients with idiopathic pulmonary fibrosis (with five healthy controls) were classed as high risk or low risk from April 28, 2014, to Aug 20, 2015, using a 52-gene signature, and we assessed whether monocyte percentage (measured by cytometry by time of flight) was higher in high-risk patients. We then examined complete blood count values in the electronic health records (EHR) of 45 068 patients with idiopathic pulmonary fibrosis, systemic sclerosis, hypertrophic cardiomyopathy, or myelofibrosis from Stanford (Jan 01, 2008, to Dec 31, 2015), Northwestern (Feb 15, 2001 to July 31, 2017), Vanderbilt (Jan 01, 2008, to Dec 31, 2016), and Optum Clinformatics DataMart (Jan 01, 2004, to Dec 31, 2016) cohorts, and examined whether absolute monocyte counts of 0·95 K/μL or greater were associated with all-cause mortality in these patients.
In the discovery analysis, estimated CD14+ classical monocyte percentages above the mean were associated with shorter transplant-free survival times (hazard ratio HR 1·82, 95% CI 1·05–3·14), whereas higher percentages of T cells and B cells were not (0·97, 0·59–1·66; and 0·78, 0·45–1·34 respectively). In two validation cohorts (COMET trial and the Yale cohort), patients with higher monocyte counts were at higher risk for poor outcomes (COMET Wilcoxon p=0·025; Yale Wilcoxon p=0·049). Monocyte counts of 0·95 K/μL or greater were associated with mortality after adjusting for forced vital capacity (HR 2·47, 95% CI 1·48–4·15; p=0·0063), and the gender, age, and physiology index (HR 2·06, 95% CI 1·22–3·47; p=0·0068) across the COMET, Stanford, and Northwestern datasets). Analysis of medical records of 7459 patients with idiopathic pulmonary fibrosis showed that patients with monocyte counts of 0·95 K/μL or greater were at increased risk of mortality with lung transplantation as a censoring event, after adjusting for age at diagnosis and sex (Stanford HR=2·30, 95% CI 0·94–5·63; Vanderbilt 1·52, 1·21–1·89; Optum 1·74, 1·33–2·27). Likewise, higher absolute monocyte count was associated with shortened survival in patients with hypertrophic cardiomyopathy across all three cohorts, and in patients with systemic sclerosis or myelofibrosis in two of the three cohorts.
Monocyte count could be incorporated into the clinical assessment of patients with idiopathic pulmonary fibrosis and other fibrotic disorders. Further investigation into the mechanistic role of monocytes in fibrosis might lead to insights that assist the development of new therapies.
Bill & Melinda Gates Foundation, US National Institute of Allergy and Infectious Diseases, and US National Library of Medicine.
Objective Traditionally, patient groups with a phenotype are selected through rule-based definitions whose creation and validation are time-consuming. Machine learning approaches to electronic ...phenotyping are limited by the paucity of labeled training datasets. We demonstrate the feasibility of utilizing semi-automatically labeled training sets to create phenotype models via machine learning, using a comprehensive representation of the patient medical record.
Methods We use a list of keywords specific to the phenotype of interest to generate noisy labeled training data. We train L1 penalized logistic regression models for a chronic and an acute disease and evaluate the performance of the models against a gold standard.
Results Our models for Type 2 diabetes mellitus and myocardial infarction achieve precision and accuracy of 0.90, 0.89, and 0.86, 0.89, respectively. Local implementations of the previously validated rule-based definitions for Type 2 diabetes mellitus and myocardial infarction achieve precision and accuracy of 0.96, 0.92 and 0.84, 0.87, respectively.
We have demonstrated feasibility of learning phenotype models using imperfectly labeled data for a chronic and acute phenotype. Further research in feature engineering and in specification of the keyword list can improve the performance of the models and the scalability of the approach.
Conclusions Our method provides an alternative to manual labeling for creating training sets for statistical models of phenotypes. Such an approach can accelerate research with large observational healthcare datasets and may also be used to create local phenotype models.
OBJECTIVES:To estimate the impact of each of six types of acute organ dysfunction (hepatic, renal, coagulation, neurologic, cardiac, and respiratory) on long-term mortality after surviving sepsis ...hospitalization.
DESIGN:Multicenter, retrospective study.
SETTINGS:Twenty-one hospitals within an integrated healthcare delivery system in Northern California.
PATIENTS:Thirty thousand one hundred sixty-three sepsis patients admitted through the emergency department between 2010 and 2013, with mortality follow-up through April 2015.
INTERVENTIONS:None.
MEASUREMENTS AND MAIN RESULTS:Acute organ dysfunction was quantified using modified Sequential Organ Failure Assessment scores. The main outcome was long-term mortality among sepsis patients who survived hospitalization. The estimates of the impact of each type of acute organ dysfunction on long-term mortality were based on adjusted Cox proportional hazards models. Sensitivity analyses were conducted based on propensity score–matching and adjusted logistic regression. Hospital mortality was 9.4% and mortality was 31.7% at 1 year. Median follow-up time among sepsis survivors was 797 days (interquartile range384–1,219 d). Acute neurologic (odds ratio, 1.86; p < 0.001), respiratory (odds ratio, 1.43; p < 0.001), and cardiac (odds ratio, 1.31; p < 0.001) dysfunction were most strongly associated with short-term hospital mortality, compared with sepsis patients without these organ dysfunctions. Evaluating only patients surviving their sepsis hospitalization, acute neurologic dysfunction was also most strongly associated with long-term mortality (odds ratio, 1.52; p < 0.001) corresponding to a marginal increase in predicted 1-year mortality of 6.0% for the presence of any neurologic dysfunction (p < 0.001). Liver dysfunction was also associated with long-term mortality in all models, whereas the association for other organ dysfunction subtypes was inconsistent between models.
CONCLUSIONS:Acute sepsis-related neurologic dysfunction was the organ dysfunction most strongly associated with short- and long-term mortality and represents a key mediator of long-term adverse outcomes following sepsis.
Adverse drug events cause substantial morbidity and mortality and are often discovered after a drug comes to market. We hypothesized that Internet users may provide early clues about adverse drug ...events via their online information-seeking. We conducted a large-scale study of Web search log data gathered during 2010. We pay particular attention to the specific drug pairing of paroxetine and pravastatin, whose interaction was reported to cause hyperglycemia after the time period of the online logs used in the analysis. We also examine sets of drug pairs known to be associated with hyperglycemia and those not associated with hyperglycemia. We find that anonymized signals on drug interactions can be mined from search logs. Compared to analyses of other sources such as electronic health records (EHR), logs are inexpensive to collect and mine. The results demonstrate that logs of the search activities of populations of computer users can contribute to drug safety surveillance.