ObjectiveThis is a large prospective study aimed to develop risk prediction models of CVD and all-cause mortality in patients who survived MI.MethodsUsing 2002-2012 national electronic health record ...data from the Veterans Health Administration, sex-specific risk prediction models for CVD and all-cause death were developed from the 5-year follow-up data of 100,601 first MI survivors aged >30 years. Model performance was evaluated using a 5-fold cross-validation approach.ResultsWe followed 98,657 male and 1,944 female MI survivors up to 5 years (407,199 person-years). There were 31,622 deaths (men 31,147, women 475) and 12,901 CVD deaths (men 12,752, women 149) observed during follow up. Among men, greater age, current smoking, diabetes, atrial fibrillation, heart failure, peripheral artery disease, geographic region, and lower BMI (<20kg/m) were associated with increased risk of subsequent CVD and all-cause-mortality, while statin treatment, hypertension medication, beta-blocker, eGFR level, and high BMI (≥25 kg/m) were significantly associated with reduced risk of CVD and all-cause-mortality. Similar associations were generally observed among women. We observed U-shaped relations between total cholesterol and outcomes, and HDL cholesterol and outcomes.The prediction models demonstrated good discrimination and calibration. The estimated Harrell’s C-statistics of the final models versus the cross-validation estimates were similar, ranging from 0.75 to 0.81. The predicted risk of death was well-calibrated compared to observed risk.ConclusionsWe developed and validated risk prediction models of 5-year risk for CVD and all-cause death for patients following MI. Traditional risk factors, co-morbidity, lack of blood pressure or lipid treatment, and geographic region were all associated with greater risk of CVD and all-cause mortality.
Risk prediction models for cardiovascular disease (CVD) death developed from patients without vascular disease may not be suitable for myocardial infarction (MI) survivors. Prediction of mortality ...risk after MI may help to guide secondary prevention. Using national electronic record data from the Veterans Health Administration 2002 to 2012, we developed risk prediction models for CVD death and all-cause death based on 5-year follow-up data of 100,601 survivors of MI using Cox proportional hazards models. Model performance was evaluated using a cross-validation approach. During follow-up, there were 31,622 deaths and 12,901 CVD deaths. In men, older age, current smoking, atrial fibrillation, heart failure, peripheral artery disease, and lower body mass index were associated with greater risk of death from CVD or all-causes, and statin treatment, hypertension medication, estimated glomerular filtration rate level, and high body mass index were significantly associated with reduced risk of fatal outcomes. Similar associations and slightly different predictors were observed in women. The estimated Harrell's C-statistics of the final model versus the cross-validation estimates were 0.77 versus 0.77 in men and 0.81 versus 0.77 in women for CVD death. Similarly, the C-statistics were 0.75 versus 0.75 in men, 0.78 versus 0.75 in women for all-cause mortality. The predicted risk of death was well calibrated compared with the observed risk. In conclusion, we developed and internally validated risk prediction models of 5-year risk for CVD and all-cause death for outpatient survivors of MI. Traditional risk factors, co-morbidities, and lack of blood pressure or lipid treatment were all associated with greater risk of CVD and all-cause mortality.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
One of the justifiable criticisms of human genetic studies is the underrepresentation of participants from diverse populations. Lack of inclusion must be addressed at-scale to identify causal disease ...factors and understand the genetic causes of health disparities. We present genome-wide associations for 2068 traits from 635,969 participants in the Department of Veterans Affairs Million Veteran Program, a longitudinal study of diverse United States Veterans. Systematic analysis revealed 13,672 genomic risk loci; 1608 were only significant after including non-European populations. Fine-mapping identified causal variants at 6318 signals across 613 traits. One-third ( n = 2069) were identified in participants from non-European populations. This reveals a broadly similar genetic architecture across populations, highlights genetic insights gained from underrepresented groups, and presents an extensive atlas of genetic associations.
Editor’s summary The number and size of human genomics datasets have been increasing but not uniformly, and most of the genetic data available to researchers are still derived from individuals of European descent. This shortcoming limits both the biological insights that can be gleaned from these data and their clinical applications to non-European patients, who may not match up well with the traditional study participants. To address this problem, the Million Veterans Program recruited hundreds of thousands of US veterans of various ethnic backgrounds for study. Verma et al . present this resource, as well as a few discoveries of genetic connections to disease that emerged from their diverse dataset (see the Perspective by Williamson and Fatumo). —Yevgeniya Nusinovich
INTRODUCTION Findings from genome-wide association studies (GWASs) have provided foundational knowledge of the genetic basis of disease, facilitating precision approaches for prevention and treatment. Current GWAS results are limited by underrepresentation of individuals from diverse populations, leading to concerns with generalizability regarding our knowledge of the relationships between genes, traits, and disease. The Department of Veterans Affairs (VA) Million Veteran Program (MVP), one of the largest US-based biobanks, addresses this need; 29% of MVP comprises individuals genetically similar to African (AFR), Admixed American (AMR), and East Asian (EAS) reference populations. With over 635,000 participants and more than 44.3M genotyped variants linked with detailed phenotypic data from the electronic health record (EHR), the MVP has the scale and richness of data to fill in the gaps in our knowledge of genotype-phenotype associations across diverse populations. RATIONALE Leveraging dense MVP data, we conducted GWASs across 2068 traits in four population groups based on genetic similarity to AFR, AMR, EAS, and European (EUR) reference populations. We employed statistical fine-mapping to highlight putative causal variants. This effort allowed us to characterize the genetic architecture of complex traits within diverse populations and compare genetic predisposition between population groups. We also quantified the benefits of including individuals from non-EUR population groups in the study for variant discovery and fine-mapping precision. Fine-mapping provided a foundation for nominating putative effector genes at associated loci mapping the landscape of gene-trait associations across populations to highlight both pleiotropic and heterogeneous associations. RESULTS Among 635,969 participants, we identified 26,049 variant-trait associations across 1270 traits, with 3477 being significant only when individuals from non-EUR populations were included. Fine-mapping revealed 57,601 independent signals across 936 traits, with 15,045 of these signals mapped with high confidence to a single variant. Predominantly resulting from interpopulation allele frequency differences, 2069 high-confidence signals and 549 gene nominations were unique to non-EUR groups. Notably, a signal mapped to rs76024540 implicated SLC22A18/SLC22A18AS as effector genes for keloid scarring, a condition vastly more prevalent in the AFR than the EUR population. Apart from the APOE locus’s association with dementia, we observed few instances of effect size heterogeneity across populations for fine-mapped variants. CONCLUSION This study underscores the enhanced power of GWASs with increased participant diversity, achieving greater variant discovery and fine-mapping precision than possible in the EUR population alone. Our findings reveal more similarities than differences in genetic architectures across populations, with most differences attributable to allele frequency variations between populations. Comprehensive phenome-wide genetic analysis across multiple populations. Meta-analysis of 4045 GWASs comprising 2068 traits from four population groups identified 26,049 locus-trait associations, including 9989 previously unreported. Multi-population fine-mapping prioritized high confidence signals, highlighting shared associations and elucidated pleiotropic genes driving multiple variant-trait associations.
Display omitted
•Use innovative prior weighting and TF-IDF weighting to summarize semantic vectors.•Quantify the informativeness of notes using word embedding.•Accurately identify informative notes ...by the relevance to a phenotype of interest.•Identify disease segments by summarizing semantic information from both billing codes and clinical notes.•Scalability verified in multiple institutions.
Accurately assigning phenotype information to individual patients via computational phenotyping using Electronic Health Records (EHRs) has been seen as the first step towards enabling EHRs for precision medicine research. Chart review labels annotated by clinical experts, also known as “gold standard” labels, are essential for the development and validation of computational phenotyping algorithms. However, given the complexity of EHR systems, the process of chart review is both labor intensive and time consuming. We propose a fully automated algorithm, referred to as pGUESS, to rank EHR notes according to their relevance to a given phenotype. By identifying the most relevant notes, pGUESS can greatly improve the efficiency and accuracy of chart reviews.
pGUESS uses prior guided semantic similarity to measure the informativeness of a clinical note to a given phenotype. We first select candidate clinical concepts from a pool of comprehensive medical concepts using public knowledge sources and then derive the semantic embedding vector (SEV) for a reference article (SEVref) and each note (SEVnote). The algorithm scores the relevance of a note as the cosine similarity between SEVnote and SEVref.
The algorithm was validated against four sets of 200 notes that were manually annotated by clinical experts to assess their informativeness to one of three disease phenotypes. pGUESS algorithm substantially outperforms existing unsupervised approaches for classifying the relevance status with respect to both accuracy and scalability across phenotypes. Averaging over the three phenotypes, the rank correlation between the algorithm ranking and gold standard label was 0.64 for pGUESS, but only 0.47 and 0.35 for the next two best performing algorithms. pGUESS is also much more computationally scalable compared to existing algorithms.
pGUESS algorithm can substantially reduce the burden of chart review and holds potential in improving the efficiency and accuracy of human annotation.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Development of clinical phenotypes from electronic health records (EHRs) can be resource intensive. Several phenotype libraries have been created to facilitate reuse of definitions. However, these ...platforms vary in target audience and utility. We describe the development of the Centralized Interactive Phenomics Resource (CIPHER) knowledgebase, a comprehensive public-facing phenotype library, which aims to facilitate clinical and health services research.
The platform was designed to collect and catalog EHR-based computable phenotype algorithms from any healthcare system, scale metadata management, facilitate phenotype discovery, and allow for integration of tools and user workflows. Phenomics experts were engaged in the development and testing of the site.
The knowledgebase stores phenotype metadata using the CIPHER standard, and definitions are accessible through complex searching. Phenotypes are contributed to the knowledgebase via webform, allowing metadata validation. Data visualization tools linking to the knowledgebase enhance user interaction with content and accelerate phenotype development.
The CIPHER knowledgebase was developed in the largest healthcare system in the United States and piloted with external partners. The design of the CIPHER website supports a variety of front-end tools and features to facilitate phenotype development and reuse. Health data users are encouraged to contribute their algorithms to the knowledgebase for wider dissemination to the research community, and to use the platform as a springboard for phenotyping.
CIPHER is a public resource for all health data users available at https://phenomics.va.ornl.gov/ which facilitates phenotype reuse, development, and dissemination of phenotyping knowledge.
Abstract only Introduction: The majority of population-based studies of myocardial infarction (MI) rely on billing codes for classification. Classification algorithms employing machine learning (ML) ...increasingly used for phenotyping using electronic health record (EHR) data. Hypothesis: ML algorithms integrating billing and information from narrative notes extracted using natural language processing (NLP) can improve classification of MI compared to billing code algorithms. Improved classification will improve power to compare risk factors across population subgroups. Methods: Retrospective cohort study of nationwide Veterans Affairs (VA) EHR data. MI classified using 2 approaches: (1) published billing code algorithm, (2) published phenotyping pipeline incorporating NLP and ML. Results compared against gold standard chart review of MI outcomes in 308 Veterans. We also tested known association between high density lipoprotein cholesterol (HDL-C) and MI outcomes classified using the 2 approaches among Black and White Veterans, stratified by sex and race; prior study showed HDL-C less protective for Black compared to White individuals. Results: We studied 17,176,658 million Veterans, mean age 69 years, 94% male, 12% self-report Black, 71% White. The billing code algorithm classified MI at positive predictive value (PPV) 0.64 compared to the published ML approach, PPV 0.90; the latter classified a modestly higher percentage of non-White Veterans. Using ML algorithm for MI, we replicated a reduced protective effect of HDL-C in Black vs White male and female Veterans (Table); with the billing code algorithm no association was observed between low density lipoprotein cholesterol (LDL-C) or HDL-C with MI among Black female Veterans. Conclusions: Using nationwide VA data, application of an ML approach improved classification of MI particularly among non-White Veterans, resulting in improved power to study differences in association for MI risk factors among Black and White Veterans.
Abstract only Introduction: The use of statins after acute myocardial infarction (MI) has been shown to reduce the risk of recurrent MI and mortality. We examined the association between statin ...therapy and the risk of 1 year mortality after MI hospitalization. Methods: Data from the Veterans Health Administration was used to create a national sample of Veterans hospitalized for their first MI event between 2002 and 2015. Veterans with prevalent heart failure, stroke, or cancer diagnoses at the time of discharge for the index MI and prolonged hospitalization (greater than 30 days) were excluded. The statin therapy group was defined as Veterans having any statin prescription at the time of discharge. The primary outcome was all-cause mortality obtained from the National Death Index. We fitted a Cox regression model adjusted for age, length of hospital stay, peak cardiac troponin I ratio (the ratio of the peak measurement to the reference upper limit of normal for the assay) during hospitalization, statin use before admission, beta blocker prescription at discharge, liver disease, peripheral arterial disease, estimated glomerular filtration rate, high-density lipoprotein and total cholesterol levels. Billing codes were used to define exclusion criteria and co-morbidities. Results: Among 16,263 Veterans hospitalized for MI, mean age was 62 years and 98% were men. During 350 days mean follow up, 966 deaths occurred. In the statin therapy group 709/13,334 (5.3%) of Veterans died compared to 257/2,929 (8.8%) of Veterans without statin therapy. In an age-adjusted model, 1-year mortality was 35% lower (HR 0.65, 95%CI 0.56 - 0.75) for patients that were prescribed a statin at discharge compared to Veterans who did not receive a statin at discharge. In a multivariable model we observed a 27% (HR 0.73, 95% CI 0.63 - 0.85) lower risk of death for users of statin therapy compared to non-users (Figure). Conclusions: Statin therapy prescribed after a first MI event may reduce the 1 year risk of all-cause mortality.