Purpose
Supplementing investigator‐specified variables with large numbers of empirically identified features that collectively serve as ‘proxies’ for unspecified or unmeasured factors can often ...improve confounding control in studies utilizing administrative healthcare databases. Consequently, there has been a recent focus on the development of data‐driven methods for high‐dimensional proxy confounder adjustment in pharmacoepidemiologic research. In this paper, we survey current approaches and recent advancements for high‐dimensional proxy confounder adjustment in healthcare database studies.
Methods
We discuss considerations underpinning three areas for high‐dimensional proxy confounder adjustment: (1) feature generation—transforming raw data into covariates (or features) to be used for proxy adjustment; (2) covariate prioritization, selection, and adjustment; and (3) diagnostic assessment. We discuss challenges and avenues of future development within each area.
Results
There is a large literature on methods for high‐dimensional confounder prioritization/selection, but relatively little has been written on best practices for feature generation and diagnostic assessment. Consequently, these areas have particular limitations and challenges.
Conclusions
There is a growing body of evidence showing that machine‐learning algorithms for high‐dimensional proxy‐confounder adjustment can supplement investigator‐specified variables to improve confounding control compared to adjustment based on investigator‐specified variables alone. However, more research is needed on best practices for feature generation and diagnostic assessment when applying methods for high‐dimensional proxy confounder adjustment in pharmacoepidemiologic studies.
Patients with chronic pain commonly believe their pain is related to the weather. Scientific evidence to support their beliefs is inconclusive, in part due to difficulties in getting a large dataset ...of patients frequently recording their pain symptoms during a variety of weather conditions. Smartphones allow the opportunity to collect data to overcome these difficulties. Our study
analysed daily data from 2658 patients collected over a 15-month period. The analysis demonstrated significant yet modest relationships between pain and relative humidity, pressure and wind speed, with correlations remaining even when accounting for mood and physical activity. This research highlights how citizen-science experiments can collect large datasets on real-world populations to address long-standing health questions. These results will act as a starting point for a future system for patients to better manage their health through pain forecasts.
Mycobacterium tuberculosis (M. tuberculosis) is considered innately resistant to β-lactam antibiotics. However, there is evidence that susceptibility to β-lactam antibiotics in combination with ...β–lactamase inhibitors is variable among clinical isolates, and these may present therapeutic options for drug-resistant cases. Here we report our investigation of susceptibility to β-lactam/β–lactamase inhibitor combinations among clinical isolates of M. tuberculosis, and the use of comparative genomics to understand the observed heterogeneity in susceptibility. Eighty-nine South African clinical isolates of varying first and second-line drug susceptibility patterns and two reference strains of M. tuberculosis underwent minimum inhibitory concentration (MIC) determination to two β-lactams: amoxicillin and meropenem, both alone and in combination with clavulanate, a β–lactamase inhibitor. 41/91 (45%) of tested isolates were found to be hypersusceptible to amoxicillin/clavulanate relative to reference strains, including 14/24 (58%) of multiple drug-resistant (MDR) and 22/38 (58%) of extensively drug-resistant (XDR) isolates. Genome-wide polymorphisms identified using whole-genome sequencing were used in a phylogenetically-aware linear mixed model to identify polymorphisms associated with amoxicillin/clavulanate susceptibility. Susceptibility to amoxicillin/clavulanate was over-represented among isolates within a specific clade (LAM4), in particular among XDR strains. Twelve sets of polymorphisms were identified as putative markers of amoxicillin/clavulanate susceptibility, five of which were confined solely to LAM4. Within the LAM4 clade, ‘paradoxical hypersusceptibility’ to amoxicillin/clavulanate has evolved in parallel to first and second-line drug resistance. Given the high prevalence of LAM4 among XDR TB in South Africa, our data support an expanded role for β-lactam/β-lactamase inhibitor combinations for treatment of drug-resistant M. tuberculosis.
Data competitions proved to be highly beneficial to the field of machine learning, and thus expected to provide similar advantages in the field of causal inference. As participants in the 2016 and ...2017 Atlantic Causal Inference Conference (ACIC) data competitions and co-organizers of the 2018 competition, we discuss the strengths of simulation-based competitions and suggest potential extensions to address their limitations. These suggested augmentations aim at making the data generating processes more realistic and gradually increase in complexity, allowing thorough investigations of algorithms' performance. We further outline a community-wide competition framework to evaluate an end-to-end causal inference pipeline, beginning with a causal question and a database, and ending with causal estimates.
Predictive models show promise in healthcare, but their successful deployment is challenging due to limited generalizability. Current external validation often focuses on model performance with ...restricted feature use from the original training data, lacking insights into their suitability at external sites. Our study introduces an innovative methodology for evaluating features during both the development phase and the validation, focusing on creating and validating predictive models for post-surgery patient outcomes with improved generalizability.
Electronic health records (EHRs) from 4 countries (United States, United Kingdom, Finland, and Korea) were mapped to the OMOP Common Data Model (CDM), 2008-2019. Machine learning (ML) models were developed to predict post-surgery prolonged opioid use (POU) risks using data collected 6 months before surgery. Both local and cross-site feature selection methods were applied in the development and external validation datasets. Models were developed using Observational Health Data Sciences and Informatics (OHDSI) tools and validated on separate patient cohorts.
Model development included 41 929 patients, 14.6% with POU. The external validation included 31 932 (UK), 23 100 (US), 7295 (Korea), and 3934 (Finland) patients with POU of 44.2%, 22.0%, 15.8%, and 21.8%, respectively. The top-performing model, Lasso logistic regression, achieved an area under the receiver operating characteristic curve (AUROC) of 0.75 during local validation and 0.69 (SD = 0.02) (averaged) in external validation. Models trained with cross-site feature selection significantly outperformed those using only features from the development site through external validation (P < .05).
Using EHRs across four countries mapped to the OMOP CDM, we developed generalizable predictive models for POU. Our approach demonstrates the significant impact of cross-site feature selection in improving model performance, underscoring the importance of incorporating diverse feature sets from various clinical settings to enhance the generalizability and utility of predictive healthcare models.
Background
Patients with a serious mental illness often receive care that is fragmented due to reduced availability of or access to resources, and inadequate, discontinuous, and uncoordinated care ...across health, social services, and criminal justice organizations. This article describes the creation of a multisystem analysis that derives insights from an integrated dataset including patient access to case management services, medical services, and interactions with the criminal justice system.
Methods
Data were combined from electronic systems within a US mental health ecosystem that included mental health and substance abuse services, as well as data from the criminal justice system. Cox models were applied to test the associations between delivery of services and re-incarceration. Additionally, machine learning was used to train and validate a predictive model to examine effects of non-modifiable risk factors (age, past arrests, mental health diagnosis) and modifiable risk factors (outpatient, medical and case management services, and use of a jail diversion program) on re-arrest outcome.
Results
An association was found between past arrests and admission to crisis stabilization services in this population (
N
= 10,307). Delivery of case management or medical services provided after release from jail was associated with a reduced risk for re-arrest. Predictive models linked non-modifiable and modifiable risk factors and outcomes and predicted the probability of re-arrests with fair accuracy (area under the receiver operating characteristic curve of 0.67).
Conclusions
By modeling the complex interactions between risk factors, service delivery, and outcomes, systems of care might be better enabled to meet patient needs and improve outcomes.
Esta nota refiere, en principio, el recuerdo de la figura de Elena Altuna en encuentro con la autora. Luego, retoma categorías de análisis y del estilo de su escritura, que Altuna acuñó y desarrolló ...para sus trabajos sobre la literatura colonial, en sus trabajos críticos sobre la literatura latinoamericana contemporánea.
Patients with a serious mental illness often receive care that is fragmented due to reduced availability of or access to resources, and inadequate, discontinuous, and uncoordinated care across ...health, social services, and criminal justice organizations. These gaps in care may lead to increased mental health disease burden and relapse, as well as repeated incarcerations. Further, the complex health, social service, and criminal justice ecosystem within which the patient may be embedded makes it difficult to examine the role of modifiable risk factors and delivered services on patient outcomes, particularly given that agencies often maintain isolated sets of relevant data. Here we describe an approach to creating a multisystem analysis that derives insights from an integrated data set including patient access to case management services, medical services, and interactions with the criminal justice system. We combined data from electronic systems within a US mental health ecosystem that included mental health and substance abuse services, as well as data from the criminal justice system. We applied Cox models to test the associations between delivery of services and re-incarceration. Using this approach, we found an association between arrests and crisis stabilization services in this population. We also found that delivery of case management or medical services provided after release from jail was associated with a reduced risk for re-arrest. Additionally, we used machine learning to train and validate a predictive model linking non-modifiable and modifiable risk factors and outcomes. A predictive model, constructed using elastic net regularized logistic regression, and considering age, past arrests, mental health diagnosis, as well as use of a jail diversion program, outpatient, medical and case management services predicted the probability of re-arrests with fair accuracy (AUC=.67). By modeling the complex interactions between risk factors, service delivery and outcomes, we may better enable systems of care to meet patient needs and improve outcomes.
Methods that address data shifts usually assume full access to multiple datasets. In the healthcare domain, however, privacy-preserving regulations as well as commercial interests limit data ...availability and, as a result, researchers can typically study only a small number of datasets. In contrast, limited statistical characteristics of specific patient samples are much easier to share and may be available from previously published literature or focused collaborative efforts. Here, we propose a method that estimates model performance in external samples from their limited statistical characteristics. We search for weights that induce internal statistics that are similar to the external ones; and that are closest to uniform. We then use model performance on the weighted internal sample as an estimation for the external counterpart. We evaluate the proposed algorithm on simulated data as well as electronic medical record data for two risk models, predicting complications in ulcerative colitis patients and stroke in women diagnosed with atrial fibrillation. In the vast majority of cases, the estimated external performance is much closer to the actual one than the internal performance. Our proposed method may be an important building block in training robust models and detecting potential model failures in external environments.