Display omitted
•We created a cohort study of cardiovascular disease from electronic health records.•Plots of the “baseline” timeframe suggested the potential for bias.•Researchers must address ...biases introduced by healthcare processes.
Despite growing interest in using electronic health records (EHR) to create longitudinal cohort studies, the distribution and missingness of EHR data might introduce selection bias and information bias to such analyses. We aimed to examine the yield and potential for these healthcare process biases in defining a study baseline using EHR data, using the example of cholesterol and blood pressure (BP) measurements.
We created a virtual cohort study of cardiovascular disease (CVD) from patients with eligible cholesterol profiles in the New England (NE) and Southeast (SE) networks of the Veterans Health Administration in the United States. Using clinical data from the EHR, we plotted the yield of patients with BP measurements within an expanding timeframe around an index date of cholesterol testing. We compared three groups: (1) patients with BP from the exact index date; (2) patients with BP not on the index date but within the network-specific 90th percentile around the index date; and (3) patients with no BP within the network-specific 90th percentile.
Among 589,361 total patients in the two networks, 146,636 (61.0%) of 240,479 patients from NE and 289,906 (83.1%) of 348,882 patients from SE had BP measurements on the index date. Ninety percent had BP measured within 11 days of the index date in NE and within 5 days of the index date in SE. Group 3 in both networks had fewer available race data, fewer comorbidities and CVD medications, and fewer health system encounters.
Requiring same-day risk factor measurement in the creation of a virtual CVD cohort study from EHR data might exclude 40% of eligible patients, but including patients with infrequent visits might introduce bias. Data visualization can inform study-specific strategies to address these challenges for the research use of EHR data.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Abstract only Introduction: Increased levels of cardiac troponin at the time of myocardial infarction (MI) have been shown to predict mortality. However, troponin is renally cleared and kidney ...function itself impacts mortality. We tested the hypothesis that baseline kidney function modifies the relationship between peak cardiac troponin I ratio (cTnI-R) and 30 day mortality after MI. Methods: Data from the Veterans Health Administration was used to create a national sample of hospitalized Veterans with a discharge diagnosis of MI between 2002 and 2015. The peak cTnI-R, calculated as the highest cTnI during the hospitalization compared to the upper limit of normal for each assay, was used as a proxy for the date of the MI event. Veterans with a history of cancer or blood vessel surgery 5 days before peak cTnI were excluded. The closest estimated glomerular filtration rate (eGFR) measured within 2 years prior to hospital admission was used as a marker of baseline kidney function. We created quartiles of peak cTnI-R and clinically relevant levels of eGFR (<30, 30-44, 45-59, and 60+ ml/min/1.73 m 2 ) and fitted Cox regression adjusting for calendar year, age, length of hospital stay, region, diabetes, major mental health conditions and baseline use of diuretics, anti-hypertensives and anti-lipemics. We used subjects in the first quartile of troponin with eGFR of 60+ as common reference. Results: Among 56,073 Veterans hospitalized for MI, mean age was 67 and 98% were men. During 28 days mean follow up, 4,533 deaths occurred. 30 day mortality steadily increased across quartiles of peak cTnI-R; however, the increase in mortality was higher in eGFR below 30, suggesting effect modification of troponin-mortality relation by eGFR (p for interaction between eGFR and troponin 0.03) (Figure). Conclusions: Our data show that the positive relation of troponin with 30 day mortality post MI is modified by kidney function. Veterans with impaired kidney function carry a higher risk of 30 day mortality after MI compared to those with normal eGFR for a given troponin quartile.
Estimated 10-year atherosclerotic cardiovascular disease (ASCVD) risk in diabetes mellitus patients is used to guide primary prevention, but the performance of risk estimators (2013 Pooled Cohort ...Equations PCE and Risk Equations for Complications of Diabetes RECODe) varies across populations. Data from electronic health records could be used to improve risk estimation for a health system's patients. We aimed to evaluate risk equations for initial ASCVD events in US veterans with diabetes mellitus and improve model performance in this population.
We studied 183 096 adults with diabetes mellitus and without prior ASCVD who received care in the Veterans Affairs Healthcare System (VA) from 2002 to 2016 with mean follow-up of 4.6 years. We evaluated model discrimination, using Harrell's C statistic, and calibration, using the reclassification χ
test, of the PCE and RECODe equations to predict fatal or nonfatal myocardial infarction or stroke and cardiovascular mortality. We then tested whether model performance was affected by deriving VA-specific β-coefficients. Discrimination of ASCVD events by the PCE was improved by deriving VA-specific β-coefficients (C statistic increased from 0.560 to 0.597) and improved further by including measures of glycemia, renal function, and diabetes mellitus treatment (C statistic, 0.632). Discrimination by the RECODe equations was improved by substituting VA-specific coefficients (C statistic increased from 0.604 to 0.621). Absolute risk estimation by PCE and RECODe equations also improved with VA-specific coefficients; the calibration
increased from <0.001 to 0.08 for PCE and from <0.001 to 0.005 for RECODe, where higher
indicates better calibration. Approximately two-thirds of veterans would meet a guideline indication for high-intensity statin therapy based on the PCE versus only 10% to 15% using VA-fitted models.
Existing ASCVD risk equations overestimate risk in veterans with diabetes mellitus, potentially impacting guideline-indicated statin therapy. Prediction model performance can be improved for a health system's patients using readily available electronic health record data.
Abstract only Background: Individuals with an interleukin 6 receptor (IL6R) genetic variant not on IL6R blocking therapy have biomarker profiles similar to those treated with IL6R blockers. Thus, ...studying whether the IL6R variant is protective for a phenotype can inform which diseases may benefit from treatment with IL6R blockade. To test this hypothesis, we performed a Phenome-Wide Association Study (PheWAS) to screen for associations between an IL6R variant and a broad range of phenotypes in the electronic health records (EHR). Methods: We studied veteran participants in the Veteran’s Affairs Million Veteran’s Project using genomic data linked to EHR. We extracted all diagnoses codes and mapped them to phenotype groups using published PheWAS methods. Routine laboratory measurements, e.g. liver function tests, were also extracted. A PheWAS was performed by constructing logistic regression models testing associations between the IL6R variant (Asp358Ala, rs2228145) and 1,866 phenotype groups; linear regression models were constructed to screen for associations between IL6R and 26 routine laboratory measurements. All models were adjusted for age, gender, and race. Significance was reported using false discovery rate ≤0.05 and Bonferroni correction. Results: We studied 342,529 participants; the minor allele frequency of the IL6R variant was 35.3%. IL6R was most strongly associated with a reduced risk of aortic aneurysm (OR 0.91-0.92, 95% CI 0.89, 0.94) (Figure 1). We observed the expected association between IL6R and reduced C-reactive protein. We also observed known side effects of IL6R blockade, elevated transaminases, as well as elevated triglycerides, an initially unexpected result in the early clinical trials. Conclusion: In this proof of concept study, we demonstrate the utility of PheWAS to inform drug effects using the largest US-based biobank study. The strong association with aortic aneurysm corresponded with the newest indication for IL6R blockade to prevent aortic aneurysms due to large vessel vasculitis.
Current guidelines recommend statin therapy for millions of US residents for the primary prevention of atherosclerotic cardiovascular disease (ASCVD). It is unclear whether traditional prediction ...models that do not account for current widespread statin use are sufficient for risk assessment.
To examine the performance of the Pooled Cohort Equations (PCE) for 5-year ASCVD risk estimation in a contemporary cohort and to test the hypothesis that inclusion of statin therapy improves model performance.
This cohort study included adult patients in the Veterans Affairs health care system without baseline ASCVD. Using national electronic health record data, 3 Cox proportional hazards models were developed to estimate 5-year ASCVD risk, as follows: the variables and published β coefficients from the PCE (model 1), the PCE variables with cohort-derived β coefficients (model 2), and model 2 plus baseline statin use (model 3). Data were collected from January 2002 to December 2012 and analyzed from June 2016 to March 2020.
Traditional ASCVD risk factors from the PCE plus baseline statin use.
Incident ASCVD and ASCVD mortality.
Of 1 672 336 patients in the cohort (mean SD baseline age 58.0 13.8 years, 1 575 163 94.2% men, 1 383 993 82.8% white), 312 155 (18.7%) were receiving statin therapy at baseline. During 5 years of follow-up, 66 605 (4.0%) experienced an ASCVD event, and 31 878 (1.9%) experienced ASCVD death. Compared with the original PCE, the cohort-derived model did not improve model discrimination in any of the 4 age-sex strata but did improve model calibration. The PCE overestimated ASCVD risk compared with the cohort-derived model; 211 237 of 1 136 161 white men (18.6%), 29 634 of 218 463 black men (13.6%), 1741 of 44 399 white women (3.9%), and 836 of 16 034 black women (5.2%) would be potentially eligible for statin therapy under the PCE but not the cohort-derived model. When added to the cohort-derived model, baseline statin therapy was associated with a 7% (95% CI, 5%-9%) lower relative risk of ASCVD and a 25% (95% CI, 23%-28%) lower relative risk for ASCVD death.
In this study, lower than expected rates of incident ASCVD events in a contemporary national cohort were observed. The PCE overestimated ASCVD risk, and more than 15% of patients would be potentially eligible for statin therapy based on the PCE but not on a cohort-derived model. In the statin era, health care professionals and systems should base ASCVD risk assessment on models calibrated to their patient populations.
Large databases provide an efficient way to analyze patient data. A challenge with these databases is the inconsistency of ICD codes and a potential for inaccurate ascertainment of cases. The purpose ...of this study was to develop and validate a reliable protocol to identify cases of acute ischemic stroke (AIS) from a large national database.
Using the national Veterans Affairs electronic health-record system, Center for Medicare and Medicaid Services, and National Death Index data, we developed an algorithm to identify cases of AIS. Using a combination of inpatient and outpatient ICD9 codes, we selected cases of AIS and controls from 1992 to 2014. Diagnoses determined after medical-chart review were considered the gold standard. We used a machine-learning algorithm and a neural network approach to identify AIS from ICD9 codes and electronic health-record information and compared it with a previous rule-based stroke-classification algorithm.
We reviewed administrative hospital data, ICD9 codes, and medical records of 268 patients in detail. Compared with the gold standard, this AIS algorithm had a sensitivity of 91%, specificity of 95%, and positive predictive value of 88%. A total of 80,508 highly likely cases of AIS were identified using the algorithm in the Veterans Affairs national cardiovascular disease-risk cohort (n=2,114,458).
Our algorithm had high specificity for identifying AIS in a nationwide electronic health-record system. This approach may be utilized in other electronic health databases to accurately identify patients with AIS.
The Department of Veteran’s Affairs (VA) archives one of the largest corpora of clinical notes in their corporate data warehouse as unstructured text data. Unstructured text easily supports keyword ...searches and regular expressions. Often these simple searches do not adequately support the complex searches that need to be performed on notes. For example, a researcher may want all notes with a Duke Treadmill Score of less than five or people that smoke more than one pack per day. Range queries like this and more can be supported by modelling text as semi-structured documents. In this paper, we implement a scalable machine learning pipeline that models plain medical text as useful semi-structured documents. We improve on existing models and achieve an F1-score of 0.912 and scale our methods to the entire VA corpus.
Electronic health records (EHRs) provide a wealth of data for phenotype development in population health studies, and researchers invest considerable time to curate data elements and validate disease ...definitions. The ability to reproduce well-defined phenotypes increases data quality, comparability of results and expedites research. In this paper, we present a standardized approach to organize and capture phenotype definitions, resulting in the creation of an open, online repository of phenotypes. This resource captures phenotype development, provenance and process from the Million Veteran Program, a national mega-biobank embedded in the Veterans Health Administration (VHA). To ensure that the repository is searchable, extendable, and sustainable, it is necessary to develop both a proper digital catalog architecture and underlying metadata infrastructure to enable effective management of the data fields required to define each phenotype. Our methods provide a resource for VHA investigators and a roadmap for researchers interested in standardizing their phenotype definitions to increase portability.
Familial hypercholesterolemia (FH) is characterized by inherited high levels of low-density lipoprotein cholesterol (LDL-C) and premature coronary heart disease (CHD). Over a thousand low-frequency ...variants in
and
have been implicated in FH but few have been examined at the population level. We aim to estimate the phenotypic effects of a subset of FH variants on LDL-C and clinical outcomes among 331,107 multi-ethnic participants.
We examined the individual and collective association between putatively pathogenic FH variants included on the MVP biobank array and the maximum LDL-C level over an interval of 15 years (maxLDL). We assessed the collective effect on clinical outcomes by leveraging data from 61.7 million clinical encounters.
We found 8 out of 16 putatively pathogenic FH variants with ≥30 observed carriers to be significantly associated with elevated maxLDL (9.4-80.2 mg/dL). Phenotypic effects were similar for European and African Americans despite substantial differences in carrier frequencies. Based on observed effects on maxLDL, we identified a total of 748 carriers (1:443) who had elevated maxLDL (36.5±1.4 mg/dL, p=1.2×10
), and higher prevalence of clinical diagnoses related to hypercholesterolemia and CHD in a phenome-wide scan. Adjusted for maxLDL, FH variants collectively associated with higher prevalence of CHD (odds ratio, 1.59 95% CI 1.36-1.86, p=1.1×10
) but not peripheral artery disease.
The distribution and phenotypic effects of putatively pathogenic FH variants were heterogeneous within and across variants. More robust evidence of genotype-phenotype associations of FH variants in multi-ethnic populations is needed to accurately infer at-risk individuals from genetic screening.