As availability of health care data for research opens up new frontiers in medical statistics, keeping a focus on the science behind the data is more important than ever to promote sound research and ...protect the validity of research results. Though the electronic databases currently amassed for research far exceed in scale and scope the observational research Professor Hill likely conceived of, his guidance to statisticians to ground our work in the biological and medical processes behind the data remains salient across the decades.
Abstract
Objective
Electronic health record (EHR)-derived data are extensively used in health research. However, the pattern of patient interaction with the healthcare system can result in ...informative presence bias if those who have poorer health have more data recorded than healthier patients. We aimed to determine how informative presence affects bias across multiple scenarios informed by real-world healthcare utilization patterns.
Materials and methods
We conducted an analysis of EHR data from a pediatric healthcare system as well as simulation studies to characterize conditions under which informative presence bias is likely to occur. This analysis extends prior work by examining a variety of scenarios for the relationship between a biomarker and a health event of interest and the healthcare visit process.
Results
Using biomarker values gathered at both informative and noninformative visits when estimating the effect of the biomarker on the event of interest resulted in minimal bias when the biomarker was relatively stable over time but produced substantial bias when the biomarker was more volatile. Adjusting analyses for the number of prior visits within a fixed look-back window was able to reduce but not eliminate this bias.
Discussion
These results suggest that bias may arise frequently in commonly encountered scenarios and may not be eliminated by adjusting for prior visit intensity.
Conclusion
Depending on the context, the estimated effect from analyses using data from all visits available may diverge from the true effect. Sensitivity analyses using only visits likely to be informative or noninformative based on visit type may aid in the assessment of the magnitude of potential bias.
Objective To determine whether higher cumulative use of benzodiazepines is associated with a higher risk of dementia or more rapid cognitive decline.Design Prospective population based ...cohort.Setting Integrated healthcare delivery system, Seattle, Washington.Participants 3434 participants aged ≥65 without dementia at study entry. There were two rounds of recruitment (1994-96 and 2000-03) followed by continuous enrollment beginning in 2004.Main outcomes measures The cognitive abilities screening instrument (CASI) was administered every two years to screen for dementia and was used to examine cognitive trajectory. Incident dementia and Alzheimer’s disease were determined with standard diagnostic criteria. Benzodiazepine exposure was defined from computerized pharmacy data and consisted of the total standardized daily doses (TSDDs) dispensed over a 10 year period (a rolling window that moved forward in time during follow-up). The most recent year was excluded because of possible use for prodromal symptoms. Multivariable Cox proportional hazard models were used to examine time varying use of benzodiazepine and dementia risk. Analyses of cognitive trajectory used linear regression models with generalized estimating equations.Results Over a mean follow-up of 7.3 years, 797 participants (23.2%) developed dementia, of whom 637 developed Alzheimer’s disease. For dementia, the adjusted hazard ratios associated with cumulative benzodiazepine use compared with non-use were 1.25 (95% confidence interval 1.03 to 1.51) for 1-30 TSDDs; 1.31 (1.00 to 1.71) for 31-120 TSDDs; and 1.07 (0.82 to 1.39) for ≥121 TSDDs. Results were similar for Alzheimer’s disease. Higher benzodiazepine use was not associated with more rapid cognitive decline.Conclusion The risk of dementia is slightly higher in people with minimal exposure to benzodiazepines but not with the highest level of exposure. These results do not support a causal association between benzodiazepine use and dementia.
Missing data are common in studies using electronic health records (EHRs)-derived data. Missingness in EHR data is related to healthcare utilization patterns, resulting in complex and potentially ...missing not at random missingness mechanisms. Prior research has suggested that machine learning-based multiple imputation methods may outperform traditional methods and may perform well even in settings of missing not at random missingness.
We used plasmode simulations based on a nationwide EHR-derived de-identified database for patients with metastatic urothelial carcinoma to compare the performance of multiple imputation using chained equations, random forests, and denoising autoencoders in terms of bias and precision of hazard ratio estimates under varying proportions of observations with missing values and missingness mechanisms (missing completely at random, missing at random, and missing not at random).
Multiple imputation by chained equations and random forest methods had low bias and similar standard errors for parameter estimates under missingness completely at random. Under missingness at random, denoising autoencoders had higher bias than multiple imputation by chained equations and random forests. Contrary to results of prior studies of denoising autoencoders, all methods exhibited substantial bias under missingness not at random, with bias increasing in direct proportion to the amount of missing data.
We found no advantage of denoising autoencoders for multiple imputation in the setting of an epidemiologic study conducted using EHR data. Results suggested that denoising autoencoders may overfit the data leading to poor confounder control. Use of more flexible imputation approaches does not mitigate bias induced by missingness not at random and can produce estimates with spurious precision.
Data evaluating the impact of objectively measured psoriasis severity on type 2 diabetes mellitus (T2DM) risk are lacking.
To determine the risk for T2DM in patients with psoriasis compared with that ...in adults without psoriasis, stratified by categories of directly assessed body surface area (BSA) affected by psoriasis.
A prospective, population-based, cohort study from the United Kingdom in which 8124 adults with psoriasis and 76,599 adults without psoriasis were followed prospectively for approximately 4 years.
There were 280 incident cases of diabetes in the psoriasis group (3.44%) and 1867 incident cases of diabetes in those without psoriasis (2.44%). After adjustment for age, sex and body mass index, the hazard ratios for development of incident diabetes were 1.21 (95% confidence interval CI, 1.01-1.44), 1.01 (95% CI, 0.81-1.26), and 1.64 (95% CI, 1.23-2.18) in the groups with 2% or less of their BSA affected, 3% to 10% of their BSA affected, and 10% or more of their BSA affected compared with in the groups without psoriasis, respectively (P = .004 for trend). Worldwide, we estimate an additional 125,650 new diagnoses of T2DM per year in patients with psoriasis as compared with in those without psoriasis.
Relatively short-term follow-up and exclusion of prevalence cases, which may have masked associations in patients with less extensive psoriasis.
Clinicians may measure BSA affected by psoriasis to target diabetes prevention efforts for patients with psoriasis.
Accurate outcome and exposure ascertainment in electronic health record (EHR) data, referred to as EHR phenotyping, relies on the completeness and accuracy of EHR data for each individual. However, ...some individuals, such as those with a greater comorbidity burden, visit the health care system more frequently and thus have more complete data, compared with others. Ignoring such dependence of exposure and outcome misclassification on visit frequency can bias estimates of associations in EHR analysis. We developed a framework for describing the structure of outcome and exposure misclassification due to informative visit processes in EHR data and assessed the utility of a quantitative bias analysis approach to adjusting for bias induced by informative visit patterns. Using simulations, we found that this method produced unbiased estimates across all informative visit structures, if the phenotype sensitivity and specificity were correctly specified. We applied this method in an example where the association between diabetes and progression-free survival in metastatic breast cancer patients may be subject to informative presence bias. The quantitative bias analysis approach allowed us to evaluate robustness of results to informative presence bias and indicated that findings were unlikely to change across a range of plausible values for phenotype sensitivity and specificity. Researchers using EHR data should carefully consider the informative visit structure reflected in their data and use appropriate approaches such as the quantitative bias analysis approach described here to evaluate robustness of study findings.
An individualized treatment rule (ITR) is a function that inputs patient-level information and outputs a recommended treatment. An important focus of precision medicine is to develop optimal ITRs ...that maximize a population-level distributional summary. However, guidance for estimating and evaluating optimal ITRs in the presence of missing data is limited. Our work is motivated by the Social Incentives to Encourage Physical Activity and Understand Predictors (STEP UP) study. Participants were randomized to a control or one of three interventions designed to increase physical activity and were given wearable devices to record daily steps as a measure of physical activity. Many participants were missing at least one daily step count during the study period. In the primary analysis of the STEP UP trial, multiple imputation (MI) was used to address missingness in daily step counts. Despite ubiquitous use of MI in practice, it has been given relatively little attention in the context of personalized medicine. We fill this gap by describing two frameworks for estimation and evaluation of an optimal ITR following MI and assessing their performance using simulated data. One framework relies on splitting the data into independent training and testing sets for estimation and evaluation, respectively. The other framework estimates an optimal ITR using the full data and constructs an
-out-of-
bootstrap confidence interval to evaluate its performance. Finally, we provide an illustrative analysis to estimate and evaluate an optimal ITR from the STEP UP data with a focus on practical considerations such as choosing the number of imputations.
In the causal analysis of observational data, the positivity assumption requires that all treatments of interest be observed in every patient subgroup. Violations of this assumption are indicated by ...nonoverlap in the data in the sense that patients with certain covariate combinations are not observed to receive a treatment of interest, which may arise from contraindications to treatment or small sample size. In this paper, we emphasize the importance and implications of this often‐overlooked assumption. Further, we elaborate on the challenges nonoverlap poses to estimation and inference and discuss previously proposed methods. We distinguish between structural and practical violations and provide insight into which methods are appropriate for each. To demonstrate alternative approaches and relevant considerations (including how overlap is defined and the target population to which results may be generalized) when addressing positivity violations, we employ an electronic health record‐derived data set to assess the effects of metformin on colon cancer recurrence among diabetic patients.