Display omitted
•Deep Learning (DL) is becoming the main way to study electronic health records (EHR).•The first comparative review of the key DL architectures used for EHR is carried out.•One of the ...largest EHR databases, containing data from 4 M people, is introduced.•A set of best practices to work with EHR using DL has been shared.•Recurrent DL architectures showed superior flexibility and predictive power.
Despite the recent developments in deep learning models, their applications in clinical decision-support systems have been very limited. Recent digitalisation of health records, however, has provided a great platform for the assessment of the usability of such techniques in healthcare. As a result, the field is starting to see a growing number of research papers that employ deep learning on electronic health records (EHR) for personalised prediction of risks and health trajectories. While this can be a promising trend, vast paper-to-paper variability (from data sources and models they use to the clinical questions they attempt to answer) have hampered the field’s ability to simply compare and contrast such models for a given application of interest. Thus, in this paper, we aim to provide a comparative review of the key deep learning architectures that have been applied to EHR data. Furthermore, we also aim to: (1) introduce and use one of the world’s largest and most complex linked primary care EHR datasets (i.e., Clinical Practice Research Datalink, or CPRD) as a new asset for training such data-hungry models; (2) provide a guideline for working with EHR data for deep learning; (3) share some of the best practices for assessing the “goodness” of deep-learning models in clinical risk prediction; (4) and propose future research ideas for making deep learning models more suitable for the EHR data. Our results highlight the difficulties of working with highly imbalanced datasets, and show that sequential deep learning architectures such as RNN may be more suitable to deal with the temporal nature of EHR.
Background Myocardial infarction (MI), stroke and diabetes share underlying risk factors and commonalities in clinical management. We examined if their combined impact on mortality is proportional, ...amplified or less than the expected risk separately of each disease and whether the excess risk is explained by their associated comorbidities. Methods Using large-scale electronic health records, we identified 2,007,731 eligible patients (51% women) and registered with general practices in the UK and extracted clinical information including diagnosis of myocardial infarction (MI), stroke, diabetes and 53 other long-term conditions before 2005 (study baseline). We used Cox regression to determine the risk of all-cause mortality with age as the underlying time variable and tested for excess risk due to interaction between cardiometabolic conditions. Results At baseline, the mean age was 51 years, and 7% (N = 145,910) have had a cardiometabolic condition. After a 7-year mean follow-up, 146,994 died. The sex-adjusted hazard ratios (HR) (95% confidence interval CI) of all-cause mortality by baseline disease status, compared to those without cardiometabolic disease, were MI = 1.51 (1.49-1.52), diabetes = 1.52 (1.51-1.53), stroke = 1.84 (1.82-1.86), MI and diabetes = 2.14 (2.11-2.17), MI and stroke = 2.35 (2.30-2.39), diabetes and stroke = 2.53 (2.50-2.57) and all three = 3.22 (3.15-3.30). Adjusting for other concurrent comorbidities attenuated these estimates, including the risk associated with having all three conditions (HR = 1.81 95% CI 1.74-1.89). Excess risks due to interaction between cardiometabolic conditions, particularly when all three conditions were present, were not significantly greater than expected from the individual disease effects. Conclusion Myocardial infarction, stroke and diabetes were associated with excess mortality, without evidence of any amplification of risk in people with all three diseases. The presence of other comorbidities substantially contributed to the excess mortality risks associated with cardiometabolic disease multimorbidity. Keywords: Myocardial infarction, Stroke, Diabetes, Multimorbidity, Mortality, Electronic health records
Background How measures of long-term exposure to elevated blood pressure might add to the performance of "current" blood pressure in predicting future cardiovascular disease is unclear. We compared ...incident cardiovascular disease risk prediction using past, current, and usual systolic blood pressure alone or in combination. Methods and Results Using data from UK primary care linked electronic health records, we applied a landmark cohort study design and identified 80 964 people, aged 50 years (derivation cohort=64 772; validation cohort=16 192), who, at study entry, had recorded blood pressure, no prior cardiovascular disease, and no previous antihypertensive or lipid-lowering prescriptions. We used systolic blood pressure recorded up to 10 years before baseline to estimate past systolic blood pressure (mean, time-weighted mean, and variability) and usual systolic blood pressure (correcting current values for past time-dependent blood pressure fluctuations) and examined their prospective relation with incident cardiovascular disease (first hospitalization for or death from coronary heart disease or stroke/transient ischemic attack). We used Cox regression to estimate hazard ratios and applied Bayesian analysis within a machine learning framework in model development and validation. Predictive performance of models was assessed using discrimination (area under the receiver operating characteristic curve) and calibration metrics. We found that elevated past, current, and usual systolic blood pressure values were separately and independently associated with increased incident cardiovascular disease risk. When used alone, the hazard ratio (95% credible interval) per 20-mm Hg increase in current systolic blood pressure was 1.22 (1.18-1.30), but associations were stronger for past systolic blood pressure (mean and time-weighted mean) and usual systolic blood pressure (hazard ratio ranging from 1.39-1.45). The area under the receiver operating characteristic curve for a model that included current systolic blood pressure, sex, smoking, deprivation, diabetes mellitus, and lipid profile was 0.747 (95% credible interval, 0.722-0.811). The addition of past systolic blood pressure mean, time-weighted mean, or variability to this model increased the area under the receiver operating characteristic curve (95% credible interval) to 0.750 (0.727-0.811), 0.750 (0.726-0.811), and 0.748 (0.723-0.811), respectively, with all models showing good calibration. Similar small improvements in area under the receiver operating characteristic curve were observed when testing models on the validation cohort, in sex-stratified analyses, or by using different landmark ages (40 or 60 years). Conclusions Using multiple blood pressure recordings from patients' electronic health records showed stronger associations with incident cardiovascular disease than a single blood pressure measurement, but their addition to multivariate risk prediction models had negligible effects on model performance.
: Exposure to air pollution during intrauterine development and through childhood may have lasting effects on respiratory health.
: To investigate lung function at ages 8 and 15 years in relation to ...air pollution exposures during pregnancy, infancy, and childhood in a UK population-based birth cohort.
: Individual exposures to source-specific particulate matter ≤10 μm in aerodynamic diameter (PM
) during each trimester, 0-6 months, 7-12 months (1990-1993), and up to age 15 years (1991-2008) were examined in relation to FEV
% predicted and FVC% predicted at ages 8 (
= 5,276) and 15 (
= 3,446) years using linear regression models adjusted for potential confounders. A profile regression model was used to identify sensitive time periods.
: We did not find clear evidence of a sensitive exposure period for PM
from road traffic. At age 8 years, 1 μg/m
higher exposure during the first trimester was associated with lower FEV
% predicted (-0.826; 95% confidence interval CI, -1.357 to -0.296) and FVC% predicted (-0.817; 95% CI, -1.357 to -0.276), but similar associations were seen for exposures for other trimesters, 0-6 months, 7-12 months, and 0-7 years. Associations were stronger among boys, as well as children whose mother had a lower education level or smoked during pregnancy. For PM
from all sources, the third trimester was associated with lower FVC% predicted (-1.312; 95% CI, -2.100 to -0.525). At age 15 years, no adverse associations with lung function were seen.
: Exposure to road-traffic PM
during pregnancy may result in small but significant reductions in lung function at age 8 years.
Multimorbidity, or the presence of several medical conditions in the same individual, has been increasing in the population — both in absolute and relative terms. Nevertheless, multimorbidity remains ...poorly understood, and the evidence from existing research to describe its burden, determinants and consequences has been limited. Previous studies attempting to understand multimorbidity patterns are often cross-sectional and do not explicitly account for multimorbidity patterns’ evolution over time; some of them are based on small datasets and/or use arbitrary and narrow age ranges; and those that employed advanced models, usually lack appropriate benchmarking and validations. In this study, we (1) introduce a novel approach for using Non-negative Matrix Factorisation (NMF) for temporal phenotyping (i.e., simultaneously mining disease clusters and their trajectories); (2) provide quantitative metrics for the evaluation of these clusters and trajectories; and (3) demonstrate how the temporal characteristics of the disease clusters that result from our model can help mine multimorbidity networks and generate new hypotheses for the emergence of various multimorbidity patterns over time. We trained and evaluated our models on one of the world’s largest electronic health records (EHR) datasets, containing more than 7 million patients, from which over 2 million where relevant to, and hence included in this study.
Display omitted
•Prevalence of multimorbidity is increasing but its patterns remain poorly understood.•Using longitudinal EHR data from 7M individuals, we made two major contributions:•Employed NMF in a novel way to mine temporal multimorbidity phenotypes.•New framework to evaluate the goodness of multimorbidity clusters and trajectories.
Multimorbidity, or the presence of several medical conditions in the same individual, has been increasing in the population, both in absolute and relative terms. However, multimorbidity remains ...poorly understood, and the evidence from existing research to describe its burden, determinants and consequences has been limited. Previous studies attempting to understand multimorbidity patterns are often cross-sectional and do not explicitly account for multimorbidity patterns' evolution over time; some of them are based on small datasets and/or use arbitrary and narrow age ranges; and those that employed advanced models, usually lack appropriate benchmarking and validations. In this study, we (1) introduce a novel approach for using Non-negative Matrix Factorisation (NMF) for temporal phenotyping (i.e., simultaneously mining disease clusters and their trajectories); (2) provide quantitative metrics for the evaluation of disease clusters from such studies; and (3) demonstrate how the temporal characteristics of the disease clusters that result from our model can help mine multimorbidity networks and generate new hypotheses for the emergence of various multimorbidity patterns over time. We trained and evaluated our models on one of the world's largest electronic health records (EHR), with 7 million patients, from which over 2 million where relevant to this study.