Akademska digitalna zbirka SLovenije - logo
E-viri
Recenzirano Odprti dostop
  • Selection of 51 predictors ...
    Agrawal, Saaket; Klarqvist, Marcus D.R.; Emdin, Connor; Patel, Aniruddh P.; Paranjpe, Manish D.; Ellinor, Patrick T.; Philippakis, Anthony; Ng, Kenney; Batra, Puneet; Khera, Amit V.

    Patterns, 12/2021, Letnik: 2, Številka: 12
    Journal Article

    Current cardiovascular risk assessment tools use a small number of predictors. Here, we study how machine learning might: (1) enable principled selection from a large multimodal set of candidate variables and (2) improve prediction of incident coronary artery disease (CAD) events. An elastic net-based Cox model (ML4HEN-COX) trained and evaluated in 173,274 UK Biobank participants selected 51 predictors from 13,782 candidates. Beyond most traditional risk factors, ML4HEN-COX selected a polygenic score, waist circumference, socioeconomic deprivation, and several hematologic indices. A more than 30-fold gradient in 10-year risk estimates was noted across ML4HEN-COX quintiles, ranging from 0.25% to 7.8%. ML4HEN-COX improved discrimination of incident CAD (C-statistic = 0.796) compared with the Framingham risk score, pooled cohort equations, and QRISK3 (range 0.754–0.761). This approach to variable selection and model assessment is readily generalizable to a broad range of complex datasets and disease endpoints. ▪ •Elastic net regression is a useful selection tool with a large candidate variable space•This principled approach to predictor selection can improve CAD risk prediction•Performance improvement can be maintained in a simple Cox model using the 51 predictors Current cardiovascular risk stratification tools are based on a relatively small number of risk factors modeled with Cox proportional hazards models and are known to imperfectly estimate risk. The increasing prevalence of “multimodal” data sources—such as survey data, biomarker concentrations, anthropometric measures, and clinical diagnoses—offers a potential route for improvement, but simple Cox models are not well suited to these complex and often highly correlated inputs. Here, we develop a framework to select a subset of candidate predictors for a coronary artery disease (CAD) risk prediction tool from a multimodal space of 13,782 features using elastic net regularized Cox regression. Our approach selected 51 of 13,782 candidate predictors, and the resulting model demonstrated improved prediction of incident CAD compared with clinically used algorithms among a held out set of participants. Current cardiovascular risk stratification tools are based on a relatively small number of risk factors modeled with Cox proportional hazards models and are known to imperfectly estimate risk. Here, we develop a framework to select a subset of candidate predictors for a coronary artery disease (CAD) risk prediction tool from a multimodal space of 13,782 features using machine learning. This approach is readily generalizable to a broad range of large, complex datasets and disease endpoints.