Although complex machine learning models are commonly outperforming the traditional simple interpretable models, clinicians find it hard to understand and trust these complex models due to the lack ...of intuition and explanation of their predictions. The aim of this study to demonstrate the utility of various model-agnostic explanation techniques of machine learning models with a case study for analyzing the outcomes of the machine learning random forest model for predicting the individuals at risk of developing hypertension based on cardiorespiratory fitness data.
The dataset used in this study contains information of 23,095 patients who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 10-year follow-up. Five global interpretability techniques (Feature Importance, Partial Dependence Plot, Individual Conditional Expectation, Feature Interaction, Global Surrogate Models) and two local interpretability techniques (Local Surrogate Models, Shapley Value) have been applied to present the role of the interpretability techniques on assisting the clinical staff to get better understanding and more trust of the outcomes of the machine learning-based predictions.
Several experiments have been conducted and reported. The results show that different interpretability techniques can shed light on different insights on the model behavior where global interpretations can enable clinicians to understand the entire conditional distribution modeled by the trained response function. In contrast, local interpretations promote the understanding of small parts of the conditional distribution for specific instances.
Various interpretability techniques can vary in their explanations for the behavior of the machine learning model. The global interpretability techniques have the advantage that it can generalize over the entire population while local interpretability techniques focus on giving explanations at the level of instances. Both methods can be equally valid depending on the application need. Both methods are effective methods for assisting clinicians on the medical decision process, however, the clinicians will always remain to hold the final say on accepting or rejecting the outcome of the machine learning models and their explanations based on their domain expertise.
This study evaluates and compares the performance of different machine learning techniques on predicting the individuals at risk of developing hypertension, and who are likely to benefit most from ...interventions, using the cardiorespiratory fitness data. The dataset of this study contains information of 23,095 patients who underwent clinician- referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 10-year follow-up. The variables of the dataset include information on vital signs, diagnosis and clinical laboratory measurements. Six machine learning techniques were investigated: LogitBoost (LB), Bayesian Network classifier (BN), Locally Weighted Naive Bayes (LWB), Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Tree Forest (RTF). Using different validation methods, the RTF model has shown the best performance (AUC = 0.93) and outperformed all other machine learning techniques examined in this study. The results have also shown that it is critical to carefully explore and evaluate the performance of the machine learning models using various model evaluation methods as the prediction accuracy can significantly differ.
There is increasing interest to make primary data from published research publicly available. We aimed to assess the current status of making research data available in highly-cited journals across ...the scientific literature.
We reviewed the first 10 original research papers of 2009 published in the 50 original research journals with the highest impact factor. For each journal we documented the policies related to public availability and sharing of data. Of the 50 journals, 44 (88%) had a statement in their instructions to authors related to public availability and sharing of data. However, there was wide variation in journal requirements, ranging from requiring the sharing of all primary data related to the research to just including a statement in the published manuscript that data can be available on request. Of the 500 assessed papers, 149 (30%) were not subject to any data availability policy. Of the remaining 351 papers that were covered by some data availability policy, 208 papers (59%) did not fully adhere to the data availability instructions of the journals they were published in, most commonly (73%) by not publicly depositing microarray data. The other 143 papers that adhered to the data availability instructions did so by publicly depositing only the specific data type as required, making a statement of willingness to share, or actually sharing all the primary data. Overall, only 47 papers (9%) deposited full primary raw data online. None of the 149 papers not subject to data availability policies made their full primary data publicly available.
A substantial proportion of original research papers published in high-impact journals are either not subject to any data availability policies, or do not adhere to the data availability instructions in their respective journals. This empiric evaluation highlights opportunities for improvement.
Atrial fibrillation (AF) is a significant health care problem for patients with obstructive sleep apnea (OSA). Continuous positive airway pressure (CPAP) as a therapy for OSA is underused, and it is ...unknown if CPAP might reduce rates of AF. We systematically reviewed the published reports on CPAP use and risk of AF. MEDLINE, EMBASE, CINAHL, Web of Science, meeting abstracts, and Cochrane databases were searched from inception to June 2015. Studies needed to report the rates of AF in participants who were and were not on CPAP. Data were extracted by 2 authors. A total of 8 studies on OSA were identified (1 randomized controlled trial) with 698 CPAP users and 549 non-CPAP users. In a random effects model, patients treated with CPAP had a 42% decreased risk of AF (pooled risk ratio, 0.58; 95% confidence interval, 0.47 to 0.70; p <0.001). There was low heterogeneity in the results ( I2 = 30%). In metaregression analysis, benefits of CPAP were stronger for younger, obese, and male patients (p <0.05). An inverse relationship between CPAP therapy and AF recurrence was observed. Results suggest that more patients with AF also should be tested for OSA.
Purpose of Review
Cardiovascular diseases account for nearly one third of all deaths globally. Improving exercise capacity and cardiorespiratory fitness (CRF) has been an important target to reduce ...cardiovascular events. In addition, the American Heart Association defined decreased physical activity as the fourth risk factor for coronary artery disease. Multiple large cohort studies have evaluated the impact of CRF on outcomes. In this review, we will discuss the role of CRF in reducing cardiovascular morbidity and mortality.
Recent Findings
Recent data suggest that CRF has an important role in reducing not only cardiovascular and all-cause mortality, but also incident myocardial infarction, hypertension, diabetes, atrial fibrillation, heart failure, and stroke. Most recently, its role in cancer prevention started to emerge. CRF protective effects have also been seen in patients with prior comorbidities like prior coronary artery disease, heart failure, depression, end-stage renal disease, and stroke.
Summary
The prognostic value of CRF has been demonstrated in various patient populations and cardiovascular conditions. Higher CRF is associated with improved survival and decreased incidence of cardiovascular diseases (CVD) and other comorbidities including hypertension, diabetes, heart failure, and atrial fibrillation.
Background
The relation between cardiorespiratory fitness (CRF) and prostate cancer is not well established. The objective of this study was to determine whether CRF is associated with prostate ...cancer screening, incidence, or mortality.
Methods
The Henry Ford Exercise Testing Project is a retrospective cohort study of men aged 40 to 70 years without cancer who underwent physician‐referred exercise stress testing from 1995 to 2009. CRF was quantified in metabolic equivalents of task (METs) (<6 reference, 6‐9, 10‐11, and ≥12 METs), estimated from the peak workload achieved during a symptom‐limited, maximal exercise stress test. Prostate‐specific antigen (PSA) testing, incident prostate cancer, and all‐cause mortality were analyzed with multivariable adjusted Poisson regression and Cox proportional hazard models.
Results
In total, 22,827 men were included, of whom 739 developed prostate cancer, with a median follow‐up of 7.5 years. Men who had high fitness (≥12 METs) had an 28% higher risk of PSA screening (95% CI, 1.2‐1.3) compared with those who had low fitness (<6 METs. After adjusting for PSA screening, fitness was associated with higher prostate cancer incidence (men aged <55 years, P = .02; men aged >55 years, P ≤ .01), but not with advanced prostate cancer. Among the men who were diagnosed with prostate cancer, high fitness was associated with a 60% lower risk of all‐cause mortality (95% CI, 0.2‐0.9).
Conclusions
Although men with high fitness are more likely to undergo PSA screening, this does not fully account for the increased incidence of prostate cancer seen among these individuals. However, men with high fitness have a lower risk of death after a prostate cancer diagnosis, suggesting that the cancers identified may be low‐risk with little impact on long‐term outcomes.
Fitness is associated with a greater likelihood of prostate‐specific antigen screening and prostate cancer diagnosis. It is also associated with a lower risk of death after prostate cancer diagnosis.