Given ongoing challenges in non-invasive non-alcoholic liver disease (NAFLD) diagnosis, we sought to validate an ALT-based NAFLD phenotype using measures readily available in electronic health ...records (EHRs) and population-based studies by leveraging the clinical and genetic data in the Million Veteran Program (MVP), a multi-ethnic mega-biobank of US Veterans.
MVP participants with alanine aminotransferases (ALT) >40 units/L for men and >30 units/L for women without other causes of liver disease were compared to controls with normal ALT. Genetic variants spanning eight NAFLD risk or ALT-associated loci (LYPLAL1, GCKR, HSD17B13, TRIB1, PPP1R3B, ERLIN1, TM6SF2, PNPLA3) were tested for NAFLD associations with sensitivity analyses adjusting for metabolic risk factors and alcohol consumption. A manual EHR review assessed performance characteristics of the NAFLD phenotype with imaging and biopsy data as gold standards. Genetic associations with advanced fibrosis were explored using FIB4, NAFLD Fibrosis Score and platelet counts.
Among 322,259 MVP participants, 19% met non-invasive criteria for NAFLD. Trans-ethnic meta-analysis replicated associations with previously reported genetic variants in all but LYPLAL1 and GCKR loci (P<6x10-3), without attenuation when adjusted for metabolic risk factors and alcohol consumption. At the previously reported LYPLAL1 locus, the established genetic variant did not appear to be associated with NAFLD, however the regional association plot showed a significant association with NAFLD 279kb downstream. In the EHR validation, the ALT-based NAFLD phenotype yielded a positive predictive value 0.89 and 0.84 for liver biopsy and abdominal imaging, respectively (inter-rater reliability (Cohen's kappa = 0.98)). HSD17B13 and PNPLA3 loci were associated with advanced fibrosis.
We validate a simple, non-invasive ALT-based NAFLD phenotype using EHR data by leveraging previously established NAFLD risk-associated genetic polymorphisms.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Abstract
Large observational data networks that leverage routine clinical practice data in electronic health records (EHRs) are critical resources for research on coronavirus disease 2019 (COVID-19). ...Data normalization is a key challenge for the secondary use of EHRs for COVID-19 research across institutions. In this study, we addressed the challenge of automating the normalization of COVID-19 diagnostic tests, which are critical data elements, but for which controlled terminology terms were published after clinical implementation. We developed a simple but effective rule-based tool called COVID-19 TestNorm to automatically normalize local COVID-19 testing names to standard LOINC (Logical Observation Identifiers Names and Codes) codes. COVID-19 TestNorm was developed and evaluated using 568 test names collected from 8 healthcare systems. Our results show that it could achieve an accuracy of 97.4% on an independent test set. COVID-19 TestNorm is available as an open-source package for developers and as an online Web application for end users (https://clamp.uth.edu/covid/loinc.php). We believe that it will be a useful tool to support secondary use of EHRs for research on COVID-19.
Purpose:
Alpha-1 blockers, often used to treat benign prostatic hyperplasia (BPH), have been hypothesized to prevent COVID-19 complications by minimising cytokine storm release. The proposed ...treatment based on this hypothesis currently lacks support from reliable real-world evidence, however. We leverage an international network of large-scale healthcare databases to generate comprehensive evidence in a transparent and reproducible manner.
Methods:
In this international cohort study, we deployed electronic health records from Spain (SIDIAP) and the United States (Department of Veterans Affairs, Columbia University Irving Medical Center, IQVIA OpenClaims, Optum DOD, Optum EHR). We assessed association between alpha-1 blocker use and risks of three COVID-19 outcomes—diagnosis, hospitalization, and hospitalization requiring intensive services—using a prevalent-user active-comparator design. We estimated hazard ratios using state-of-the-art techniques to minimize potential confounding, including large-scale propensity score matching/stratification and negative control calibration. We pooled database-specific estimates through random effects meta-analysis.
Results:
Our study overall included 2.6 and 0.46 million users of alpha-1 blockers and of alternative BPH medications. We observed no significant difference in their risks for any of the COVID-19 outcomes, with our meta-analytic HR estimates being 1.02 (95% CI: 0.92–1.13) for diagnosis, 1.00 (95% CI: 0.89–1.13) for hospitalization, and 1.15 (95% CI: 0.71–1.88) for hospitalization requiring intensive services.
Conclusion:
We found no evidence of the hypothesized reduction in risks of the COVID-19 outcomes from the prevalent-use of alpha-1 blockers—further research is needed to identify effective therapies for this novel disease.
Abstract
Background
The development and adoption of health care common data models (CDMs) has addressed some of the logistical challenges of performing research on data generated from disparate ...health care systems by standardizing data representations and leveraging standardized terminology to express clinical information consistently. However, transforming a data system into a CDM is not a trivial task, and maintaining an operational, enterprise capable CDM that is incrementally updated within a data warehouse is challenging.
Objectives
To develop a quality assurance (QA) process and code base to accompany our incremental transformation of the Department of Veterans Affairs Corporate Data Warehouse health care database into the Observational Medical Outcomes Partnership (OMOP) CDM to prevent incremental load errors.
Methods
We designed and implemented a multistage QA) approach centered on completeness, value conformance, and relational conformance data-quality elements. For each element we describe key incremental load challenges, our extract, transform, and load (ETL) solution of data to overcome those challenges, and potential impacts of incremental load failure.
Results
Completeness and value conformance data-quality elements are most affected by incremental changes to the CDW, while updates to source identifiers impact relational conformance. ETL failures surrounding these elements lead to incomplete and inaccurate capture of clinical concepts as well as data fragmentation across patients, providers, and locations.
Conclusion
Development of robust QA processes supporting accurate transformation of OMOP and other CDMs from source data is still in evolution, and opportunities exist to extend the existing QA framework and tools used for incremental ETL QA processes.
Background Canadian Cardiovascular Society (CCS) angina severity classification is associated with mortality, myocardial infarction, and coronary revascularization in clinical trial and registry ...data. The objective of this study was to determine associations between CCS class and all-cause mortality and healthcare utilization, using natural language processing to extract CCS classifications from clinical notes. Methods and Results In this retrospective cohort study of veterans in the United States with stable angina from January 1, 2006, to December 31, 2013, natural language processing extracted CCS classifications. Veterans with a prior diagnosis of coronary artery disease were excluded. Outcomes included all-cause mortality (primary), all-cause and cardiovascular-specific hospitalizations, coronary revascularization, and 1-year healthcare costs. Of 299 577 veterans identified, 14 216 (4.7%) had ≥1 CCS classification extracted by natural language processing. The mean age was 66.6±9.8 years, 99% of participants were male, and 81% were white. During a median follow-up of 3.4 years, all-cause mortality rates were 4.58, 4.60, 6.22, and 6.83 per 100 person-years for CCS classes I, II, III, and IV, respectively. Multivariable adjusted hazard ratios for all-cause mortality comparing CCS II, III, and IV with those in class I were 1.05 (95% CI, 0.95-1.15), 1.33 (95% CI, 1.20-1.47), and 1.48 (95% CI, 1.25-1.76), respectively. The multivariable hazard ratio comparing CCS IV with CCS I was 1.20 (95% CI, 1.09-1.33) for all-cause hospitalization, 1.25 (95% CI, 0.96-1.64) for acute coronary syndrome hospitalizations, 1.00 (95% CI, 0.80-1.26) for heart failure hospitalizations, 1.05 (95% CI, 0.88-1.25) for atrial fibrillation hospitalizations, 1.92 (95% CI, 1.40-2.64) for percutaneous coronary intervention, and 2.51 (95% CI, 1.99-3.16) for coronary artery bypass grafting surgery. Conclusions Natural language processing-extracted CCS classification was positively associated with all-cause mortality and healthcare utilization, demonstrating the prognostic importance of anginal symptom assessment and documentation.
Prostate cancer (PCa) disproportionately affects African American men, but research evaluating the extent of racial and ethnic disparities across the PCa continuum in equal-access settings remains ...limited at the national level. The US Department of Veterans Affairs (VA) Veterans Hospital Administration health care system offers a setting of relatively equal access to care in which to assess racial and ethnic disparities in self-identified African American (or Black) veterans and White veterans.
To determine the extent of racial and ethnic disparities in the incidence of PCa, clinical stage, and outcomes between African American patients and White patients who received a diagnosis or were treated at a VA hospital.
This retrospective cohort study included 7 889 984 veterans undergoing routine care in VA hospitals nationwide from 2005 through 2019 (incidence cohort). The age-adjusted incidence of localized and de novo metastatic PCa was estimated. Treatment response was evaluated, and PCa-specific outcomes were compared between African American veterans and White veterans. Residual disparity in PCa outcome, defined as the leftover racial and ethnic disparity in the outcomes despite equal response to treatment, was estimated.
Self-identified African American (or Black) and White race and ethnicity.
Time to distant metastasis following PCa diagnosis was the primary outcome. Descriptive analyses were used to compare baseline demographics and clinic characteristics. Multivariable logistic regression was used to evaluate race and ethnicity association with pretreatment clinical variables. Multivariable Cox regression was used to estimate the risk of metastasis.
Data from 7 889 984 veterans from the incidence cohort were used to estimate incidence, whereas data from 92 269 veterans with localized PCa were used to assess treatment response. Among 92 269 veterans, African American men (n = 28 802 31%) were younger (median IQR, 63 58-68 vs 65 62-71 years) and had higher prostate-specific antigen levels (>20 ng/mL) at the time of diagnosis compared with White men (n = 63 467; 69%). Consistent with US population-level data, African American veterans displayed a nearly 2-fold greater incidence of localized and de novo metastatic PCa compared with White men across VA centers nationwide. Among veterans screened for PCa, African American men had a 29% increased risk of PCa detection on a diagnostic prostate biopsy compared with White (hazard ratio, 1.29; 95% CI, 1.27-1.31; P < .001). African American men who received definitive primary treatment of PCa experienced a lower risk of metastasis (hazard ratio, 0.89; 95% CI, 0.83-0.95; P < .001). However, African American men who received nondefinitive treatment classified as “other” were more likely to develop metastasis (adjusted hazard ratio, 1.29; 95% CI, 1.17-1.42; P < .001). Using the actual rate of metastasis from veterans who received definitive primary treatment, a persistent residual metastatic burden for African American men was observed across all National Comprehensive Cancer Network risk groups (low risk, 4 vs 2 per 100 000; intermediate risk, 13 vs 6 per 100 000; high risk, 19 vs 9 per 100 000).
This cohort analysis found significant disparities in the incidence of localized and metastatic PCa between African American veterans and White veterans. This increased incidence is a major factor associated with the residual disparity in PCa metastasis observed in African American veterans compared with White veterans despite their nearly equal response to treatment.
To take the first step toward assembling population-based cohorts of patients with bladder cancer with longitudinal pathology data, we developed and validated a natural language processing (NLP) ...engine that abstracts pathology data from full-text pathology reports.
Using 600 bladder pathology reports randomly selected from the Department of Veterans Affairs, we developed and validated an NLP engine to abstract data on histology, invasion (presence vs absence and depth), grade, the presence of muscularis propria, and the presence of carcinoma in situ. Our gold standard was based on an independent review of reports by 2 urologists, followed by adjudication. We assessed the NLP performance by calculating the accuracy, the positive predictive value, and the sensitivity. We subsequently applied the NLP engine to pathology reports from 10,725 patients with bladder cancer.
When comparing the NLP output to the gold standard, NLP achieved the highest accuracy (0.98) for the presence vs the absence of carcinoma in situ. Accuracy for histology, invasion (presence vs absence), grade, and the presence of muscularis propria ranged from 0.83 to 0.96. The most challenging variable was depth of invasion (accuracy 0.68), with an acceptable positive predictive value for lamina propria (0.82) and for muscularis propria (0.87) invasion. The validated engine was capable of abstracting pathologic characteristics for 99% of the patients with bladder cancer.
NLP had high accuracy for 5 of 6 variables and abstracted data for the vast majority of the patients. This now allows for the assembly of population-based cohorts with longitudinal pathology data.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP
There is controversy about the benefit of prostate-specific antigen (PSA) screening. Prostate-specific antigen screening rates have decreased since 2008 in the US, and the incidence of metastatic ...prostate cancer has increased. However, there is no direct epidemiologic evidence of a correlation between population PSA screening rates and subsequent metastatic prostate cancer rates.
To assess whether facility-level variation in PSA screening rates is associated with subsequent facility-level metastatic prostate cancer incidence.
This retrospective cohort used data for all men aged 40 years or older with an encounter at 128 facilities in the US Veterans Health Administration (VHA) from January 1, 2005, to December 31, 2019.
Yearly facility-level PSA screening rates, defined as the proportion of men aged 40 years or older with a PSA test in each year, and long-term nonscreening rates, defined as the proportion of men aged 40 years or older without a PSA test in the prior 3 years, from January 1, 2005, to December 31, 2014.
The main outcomes were facility-level yearly counts of incident metastatic prostate cancer diagnoses and age-adjusted yearly metastatic prostate cancer incidence rates (per 100 000 men) 5 years after each PSA screening exposure year.
The cohort included 4 678 412 men in 2005 and 5 371 701 men in 2019. Prostate-specific antigen screening rates decreased from 47.2% in 2005 to 37.0% in 2019, and metastatic prostate cancer incidence increased from 5.2 per 100 000 men in 2005 to 7.9 per 100 000 men in 2019. Higher facility-level PSA screening rates were associated with lower metastatic prostate cancer incidence 5 years later (incidence rate ratio IRR, 0.91 per 10% increase in PSA screening rate; 95% CI, 0.87-0.96; P < .001). Higher long-term nonscreening rates were associated with higher metastatic prostate cancer incidence 5 years later (IRR, 1.11 per 10% increase in long-term nonscreening rate; 95% CI, 1.03-1.19; P = .01).
From 2005 to 2019, PSA screening rates decreased in the national VHA system. Facilities with higher PSA screening rates had lower subsequent rates of metastatic prostate cancer. These data may be used to inform shared decision-making about the potential benefits of PSA screening among men who wish to reduce their risk of metastatic prostate cancer.
Abstract
Objective
Observational studies can impact patient care but must be robust and reproducible. Nonreproducibility is primarily caused by unclear reporting of design choices and analytic ...procedures. This study aimed to: (1) assess how the study logic described in an observational study could be interpreted by independent researchers and (2) quantify the impact of interpretations’ variability on patient characteristics.
Materials and Methods
Nine teams of highly qualified researchers reproduced a cohort from a study by Albogami et al. The teams were provided the clinical codes and access to the tools to create cohort definitions such that the only variable part was their logic choices. We executed teams’ cohort definitions against the database and compared the number of subjects, patient overlap, and patient characteristics.
Results
On average, the teams’ interpretations fully aligned with the master implementation in 4 out of 10 inclusion criteria with at least 4 deviations per team. Cohorts’ size varied from one-third of the master cohort size to 10 times the cohort size (2159–63 619 subjects compared to 6196 subjects). Median agreement was 9.4% (interquartile range 15.3–16.2%). The teams’ cohorts significantly differed from the master implementation by at least 2 baseline characteristics, and most of the teams differed by at least 5.
Conclusions
Independent research teams attempting to reproduce the study based on its free-text description alone produce different implementations that vary in the population size and composition. Sharing analytical code supported by a common data model and open-source tools allows reproducing a study unambiguously thereby preserving initial design choices.
Left ventricular ejection fraction (EF) is a key component of heart failure quality measures used within the Department of Veteran Affairs (VA). Our goals were to build a natural language processing ...system to extract the EF from free-text echocardiogram reports to automate measurement reporting and to validate the accuracy of the system using a comparison reference standard developed through human review. This project was a Translational Use Case Project within the VA Consortium for Healthcare Informatics.
We created a set of regular expressions and rules to capture the EF using a random sample of 765 echocardiograms from seven VA medical centers. The documents were randomly assigned to two sets: a set of 275 used for training and a second set of 490 used for testing and validation. To establish the reference standard, two independent reviewers annotated all documents in both sets; a third reviewer adjudicated disagreements.
System test results for document-level classification of EF of <40% had a sensitivity (recall) of 98.41%, a specificity of 100%, a positive predictive value (precision) of 100%, and an F measure of 99.2%. System test results at the concept level had a sensitivity of 88.9% (95% CI 87.7% to 90.0%), a positive predictive value of 95% (95% CI 94.2% to 95.9%), and an F measure of 91.9% (95% CI 91.2% to 92.7%).
An EF value of <40% can be accurately identified in VA echocardiogram reports.
An automated information extraction system can be used to accurately extract EF for quality measurement.