Prognostic models are abundant in the medical literature yet their use in practice seems limited. In this article, the third in the PROGRESS series, the authors review how such models are developed ...and validated, and then address how prognostic models are assessed for their impact on practice and patient outcomes, illustrating these ideas with examples.
Prognostic factor research aims to identify factors associated with subsequent clinical outcome in people with a particular disease or health condition. In this article, the second in the PROGRESS ...series, the authors discuss the role of prognostic factors in current clinical practice, randomised trials, and developing new interventions, and explain why and how prognostic factor research should be improved.
We investigated the reporting and methods of prediction studies, focusing on aims, designs, participant selection, outcomes, predictors, statistical power, statistical methods, and predictive ...performance measures.
We used a full hand search to identify all prediction studies published in 2008 in six high impact general medical journals. We developed a comprehensive item list to systematically score conduct and reporting of the studies, based on recent recommendations for prediction research. Two reviewers independently scored the studies. We retrieved 71 papers for full text review: 51 were predictor finding studies, 14 were prediction model development studies, three addressed an external validation of a previously developed model, and three reported on a model's impact on participant outcome. Study design was unclear in 15% of studies, and a prospective cohort was used in most studies (60%). Descriptions of the participants and definitions of predictor and outcome were generally good. Despite many recommendations against doing so, continuous predictors were often dichotomized (32% of studies). The number of events per predictor as a measure of statistical power could not be determined in 67% of the studies; of the remainder, 53% had fewer than the commonly recommended value of ten events per predictor. Methods for a priori selection of candidate predictors were described in most studies (68%). A substantial number of studies relied on a p-value cut-off of p<0.05 to select predictors in the multivariable analyses (29%). Predictive model performance measures, i.e., calibration and discrimination, were reported in 12% and 27% of studies, respectively.
The majority of prediction studies in high impact journals do not follow current methodological recommendations, limiting their reliability and applicability.
The incremental value of polygenic risk scores in addition to well-established risk prediction models for coronary artery disease (CAD) is uncertain.
To examine whether a polygenic risk score for CAD ...improves risk prediction beyond pooled cohort equations.
Observational study of UK Biobank participants enrolled from 2006 to 2010. A case-control sample of 15 947 prevalent CAD cases and equal number of age and sex frequency-matched controls was used to optimize the predictive performance of a polygenic risk score for CAD based on summary statistics from published genome-wide association studies. A separate cohort of 352 660 individuals (with follow-up to 2017) was used to evaluate the predictive accuracy of the polygenic risk score, pooled cohort equations, and both combined for incident CAD.
Polygenic risk score for CAD, pooled cohort equations, and both combined.
CAD (myocardial infarction and its related sequelae). Discrimination, calibration, and reclassification using a risk threshold of 7.5% were assessed.
In the cohort of 352 660 participants (mean age, 55.9 years; 205 297 women 58.2%) used to evaluate the predictive accuracy of the examined models, there were 6272 incident CAD events over a median of 8 years of follow-up. CAD discrimination for polygenic risk score, pooled cohort equations, and both combined resulted in C statistics of 0.61 (95% CI, 0.60 to 0.62), 0.76 (95% CI, 0.75 to 0.77), and 0.78 (95% CI, 0.77 to 0.79), respectively. The change in C statistic between the latter 2 models was 0.02 (95% CI, 0.01 to 0.03). Calibration of the models showed overestimation of risk by pooled cohort equations, which was corrected after recalibration. Using a risk threshold of 7.5%, addition of the polygenic risk score to pooled cohort equations resulted in a net reclassification improvement of 4.4% (95% CI, 3.5% to 5.3%) for cases and -0.4% (95% CI, -0.5% to -0.4%) for noncases (overall net reclassification improvement, 4.0% 95% CI, 3.1% to 4.9%).
The addition of a polygenic risk score for CAD to pooled cohort equations was associated with a statistically significant, yet modest, improvement in the predictive accuracy for incident CAD and improved risk stratification for only a small proportion of individuals. The use of genetic information over the pooled cohort equations model warrants further investigation before clinical implementation.
TRIPOD provides guidance on the key items to report when describing studies developing, evaluating (or validating), or updating clinical prediction models.10,11 Although TRIPOD aims primarily to ...improve reporting, it also leads to more comprehensive understanding, conduct, and analysis of prediction model studies, ensuring that prediction models can be picked up by subsequent researchers and users to be studied further and used to guide health care, thus encouraging reproducible research and reduce research waste. ...concerns have been raised that artificial intelligence in clinical medicine is overhyped and, if not used with proper guidance, knowledge, or expertise, has methodological shortcomings, poor transparency, and poor reproducibility.12 Methodological concerns include an often incorrect focus on classification over prediction, overfitting (whereby too many predictors or features are included for the sample size), lack of robust assessment of predictive accuracy when used with other data than those from which they were developed (validation), weak and unbiased comparison with simpler modelling approaches, and lack of transparency of the artificial intelligence and machine learning algorithm, which limits independent evaluation. Clearly, the consequences of making a wrong or inaccurate prediction are substantial for the clinical application of a machine learning prediction model, such as the deep learning models for detection of stroke or wrist fractures approved by the US Food and Drug Administration.13 Therefore, the clinical community must not get mesmerised by the artificial intelligence and machine learning revolution, and artificial intelligence and machine learning prediction models must be appropriately developed, evaluated, and—if needed—tailored to different situations before they are used in daily medical practice.
Like any other medical technology or intervention, diagnostic tests should be thoroughly evaluated before their introduction into daily practice. Increasingly, decision makers, physicians, and other ...users of diagnostic tests request more than simple measures of a test's analytical or technical performance and diagnostic accuracy; they would also like to see testing lead to health benefits. In this last article of our series, we introduce the notion of clinical utility, which expresses--preferably in a quantitative form--to what extent diagnostic testing improves health outcomes relative to the current best alternative, which could be some other form of testing or no testing at all. In most cases, diagnostic tests improve patient outcomes by providing information that can be used to identify patients who will benefit from helpful downstream management actions, such as effective treatment in individuals with positive test results and no treatment for those with negative results. We describe how comparative randomized clinical trials can be used to estimate clinical utility. We contrast the definition of clinical utility with that of the personal utility of tests and markers. We show how diagnostic accuracy can be linked to clinical utility through an appropriate definition of the target condition in diagnostic-accuracy studies.
Prediction models are developed to aid health care providers in estimating the probability or risk that a specific disease or condition is present (diagnostic models) or that a specific event will ...occur in the future (prognostic models), to inform their decision making. However, the overwhelming evidence shows that the quality of reporting of prediction model studies is poor. Only with full and clear reporting of information on all aspects of a prediction model can risk of bias and potential usefulness of prediction models be adequately assessed. The Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Initiative developed a set of recommendations for the reporting of studies developing, validating, or updating a prediction model, whether for diagnostic or prognostic purposes. This article describes how the TRIPOD Statement was developed. An extensive list of items based on a review of the literature was created, which was reduced after a Web-based survey and revised during a 3-day meeting in June 2011 with methodologists, health care professionals, and journal editors. The list was refined during several meetings of the steering group and in e-mail discussions with the wider group of TRIPOD contributors. The resulting TRIPOD Statement is a checklist of 22 items, deemed essential for transparent reporting of a prediction model study. The TRIPOD Statement aims to improve the transparency of the reporting of a prediction model study regardless of the study methods used. The TRIPOD Statement is best used in conjunction with the TRIPOD explanation and elaboration document. To aid the editorial process and readers of prediction model studies, it is recommended that authors include a completed checklist in their submission (also available at www.tripod-statement.org).
Carl Moons and colleagues provide a checklist and background explanation for critically appraising and extracting data from systematic reviews of prognostic and diagnostic prediction modelling ...studies. Please see later in the article for the Editors' Summary.
Abstract Objectives It is widely acknowledged that the performance of diagnostic and prognostic prediction models should be assessed in external validation studies with independent data from ...“different but related” samples as compared with that of the development sample. We developed a framework of methodological steps and statistical methods for analyzing and enhancing the interpretation of results from external validation studies of prediction models. Study Design and Setting We propose to quantify the degree of relatedness between development and validation samples on a scale ranging from reproducibility to transportability by evaluating their corresponding case-mix differences. We subsequently assess the models' performance in the validation sample and interpret the performance in view of the case-mix differences. Finally, we may adjust the model to the validation setting. Results We illustrate this three-step framework with a prediction model for diagnosing deep venous thrombosis using three validation samples with varying case mix. While one external validation sample merely assessed the model's reproducibility, two other samples rather assessed model transportability. The performance in all validation samples was adequate, and the model did not require extensive updating to correct for miscalibration or poor fit to the validation settings. Conclusion The proposed framework enhances the interpretation of findings at external validation of prediction models.
Clinical prediction models are increasingly used to complement clinical reasoning and decision-making in modern medicine, in general, and in the cardiovascular domain, in particular. To these ends, ...developed models first and foremost need to provide accurate and (internally and externally) validated estimates of probabilities of specific health conditions or outcomes in the targeted individuals. Subsequently, the adoption of such models by professionals must guide their decision-making, and improve patient outcomes and the cost-effectiveness of care. In the first paper of this series of two companion papers, issues relating to prediction model development, their internal validation, and estimating the added value of a new (bio)marker to existing predictors were discussed. In this second paper, an overview is provided of the consecutive steps for the assessment of the model's predictive performance in new individuals (external validation studies), how to adjust or update existing models to local circumstances or with new predictors, and how to investigate the impact of the uptake of prediction models on clinical decision-making and patient outcomes (impact studies). Each step is illustrated with empirical examples from the cardiovascular field.