Most screening tests for T2DM in use today were developed using multivariate regression methods that are often further simplified to allow transformation into a scoring formula. The increasing volume ...of electronically collected data opened the opportunity to develop more complex, accurate prediction models that can be continuously updated using machine learning approaches. This study compares machine learning-based prediction models (i.e. Glmnet, RF, XGBoost, LightGBM) to commonly used regression models for prediction of undiagnosed T2DM. The performance in prediction of fasting plasma glucose level was measured using 100 bootstrap iterations in different subsets of data simulating new incoming data in 6-month batches. With 6 months of data available, simple regression model performed with the lowest average RMSE of 0.838, followed by RF (0.842), LightGBM (0.846), Glmnet (0.859) and XGBoost (0.881). When more data were added, Glmnet improved with the highest rate (+ 3.4%). The highest level of variable selection stability over time was observed with LightGBM models. Our results show no clinically relevant improvement when more sophisticated prediction models were used. Since higher stability of selected variables over time contributes to simpler interpretation of the models, interpretability and model calibration should also be considered in development of clinical prediction models.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
There is a need of ensuring that learning (ML) models are interpretable. Higher interpretability of the model means easier comprehension and explanation of future predictions for end‐users. Further, ...interpretable ML models allow healthcare experts to make reasonable and data‐driven decisions to provide personalized decisions that can ultimately lead to higher quality of service in healthcare. Generally, we can classify interpretability approaches in two groups where the first focuses on personalized interpretation (local interpretability) while the second summarizes prediction models on a population level (global interpretability). Alternatively, we can group interpretability methods into model‐specific techniques, which are designed to interpret predictions generated by a specific model, such as a neural network, and model‐agnostic approaches, which provide easy‐to‐understand explanations of predictions made by any ML model. Here, we give an overview of interpretability approaches using structured data and provide examples of practical interpretability of ML in different areas of healthcare, including prediction of health‐related outcomes, optimizing treatments, or improving the efficiency of screening for specific conditions. Further, we outline future directions for interpretable ML and highlight the importance of developing algorithmic solutions that can enable ML driven decision making in high‐stakes healthcare problems.
This article is categorized under:
Application Areas > Health Care
Four groups of machine learning models for prediction in healthcare based on their interpretability characteristics
Full text
Available for:
BFBNIB, FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SAZU, SBCE, SBMB, UL, UM, UPUK
Abstract
Prior to further processing, completed questionnaires must be screened for the presence of careless respondents. Different people will respond to surveys in different ways. Some take the ...easy path and fill out the survey carelessly. The proportion of careless respondents determines the survey’s quality. As a result, identifying careless respondents is critical for the quality of obtained results. This study aims to explore the characteristics of careless respondents in survey data and evaluate the predictive power and interpretability of different types of data and indices of careless responding. The research question focuses on understanding the behavior of careless respondents and determining the effectiveness of various data sources in predicting their responses. Data from a three-month web-based survey on participants’ personality traits such as honesty-humility, emotionality, extraversion, agreeableness, conscientiousness and openness to experience was used in this study. Data for this study was taken from Schroeders et al.. The gradient boosting machine-based prediction model uses data from the answers, time spent for answering, demographic information on the respondents as well as some indices of careless responding from all three types of data. Prediction models were evaluated with tenfold cross-validation repeated a hundred times. Prediction models were compared based on balanced accuracy. Models’ explanations were provided with Shapley values. Compared with existing work, data fusion from multiple types of information had no noticeable effect on the performance of the gradient boosting machine model. Variables such as “I would never take a bribe, even if it was a lot”, average longstring, and total intra-individual response variability were found to be useful in distinguishing careless respondents. However, variables like “I would be tempted to use counterfeit money if I could get away with it” and intra-individual response variability of the first section of a survey showed limited effectiveness. Additionally, this study indicated that, whereas the psychometric synonym score has an immediate effect and is designed with the goal of identifying careless respondents when combined with other variables, it is not necessarily the optimal choice for fitting a gradient boosting machine model.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Abstract
This study presents the results of a network-based analysis of health related quality of life (HRQoL) among Slovenian adolescents. The study aimed to examine the relationship between HRQoL ...and mental well-being among adolescents of different age and gender groups. A cross-sectional study was conducted from November 2019 to January 2020 in 16 primary and 9 secondary schools in Slovenia. The KIDSCREEN-27 scale was used to collect the data on HRQoL, and the Warwick–Edinburgh Mental Well-being Scale to collect data on mental well-being. We used network model trees to demonstrate differences in psychometric network structure measuring correlations between different concepts in adolescent HRQoL. A total of 2972 students aged 10–19 years participated in the study. The significant split in the network tree (
p
< 0.001) indicated differences in relations between HRQoL subscale scores and mental well-being score among adolescents younger than 12 years old. In comparison to older adolescents the correlation between mental well-being and mood scores was significantly weaker in this group of the youngest participants (p < 0.001). A network model tree analysis also uncovered an interesting pattern based on gender and age (
p
< 0.013) where a correlation between mood and family support became weaker for female at the age of 12 and for male at the age of 16. Data mining techniques have recently been used by healthcare researchers and professionals. Network-based analysis is an innovative alternative to classical approaches in HRQoL research. In this study we demonstrate the significant differences in the perceptions of HRQoL and mental well-being among adolescents in different age and gender groups that were discovered using tree-based network analysis.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible ...models where knowledge extraction and explanation of reasoning behind the classification model are possible.
This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree.
The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree.
The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Classical paper-and-pencil based risk assessment questionnaires are often accompanied by the online versions of the questionnaire to reach a wider population. This study focuses on the loss, ...especially in risk estimation performance, that can be inflicted by direct transformation from the paper to online versions of risk estimation calculators by ignoring the possibilities of more complex and accurate calculations that can be performed using the online calculators. We empirically compare the risk estimation performance between four major diabetes risk calculators and two, more advanced, predictive models. National Health and Nutrition Examination Survey (NHANES) data from 1999-2012 was used to evaluate the performance of detecting diabetes and pre-diabetes. American Diabetes Association risk test achieved the best predictive performance in category of classical paper-and-pencil based tests with an Area Under the ROC Curve (AUC) of 0.699 for undiagnosed diabetes (0.662 for pre-diabetes) and 47% (47% for pre-diabetes) persons selected for screening. Our results demonstrate a significant difference in performance with additional benefits for a lower number of persons selected for screening when statistical methods are used. The best AUC overall was obtained in diabetes risk prediction using logistic regression with AUC of 0.775 (0.734) and an average 34% (48%) persons selected for screening. However, generalized boosted regression models might be a better option from the economical point of view as the number of selected persons for screening of 30% (47%) lies significantly lower for diabetes risk assessment in comparison to logistic regression (p < 0.001), with a significantly higher AUC (p < 0.001) of 0.774 (0.740) for the pre-diabetes group. Our results demonstrate a serious lack of predictive performance in four major online diabetes risk calculators. Therefore, one should take great care and consider optimizing the online versions of questionnaires that were primarily developed as classical paper questionnaires.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Type 2 diabetes mellitus (T2DM) affects a patient's physical, social, and mental well-being. Perceptions of the illness are linked to quality of life. The aim of this study was to assess illness ...perception in patients diagnosed with T2DM and to validate the Brief Illness Perception Questionnaire in the Slovenian language. A cross-sectional study involved 141 patients diagnosed with T2DM. We performed a content analysis of the questionnaire and estimated the S-CVI, I-CVI, kappa coefficient. We also used Cronbach's alpha to assess the reliability. Participants did not have a very threatening perception of T2DM, but being overweight and having cardiovascular disease were significant contributors to a more threatening perception. The most frequently indicated factors influencing the onset and development of T2DM were heredity and genetics, stress and other psychological distress, and poor and inadequate nutrition. I-CVI ranged from 0.833 to 1.00, while the kappa is greater than 0.74, confirming the excellent validity of the questions. The content validity assessment of the questionnaire further confirms that the questionnaire is suitable for use with the target population in Slovenia. The questionnaire proved to be a valid and reliable tool that can be used to assess the relationship between illness perception and self-management of T2DM.
There are many methods available for measuring social support and quality of life (QoL) of adolescents, of these, the KIDSCREEN tools are most widely used. Thus, we aimed to translate and validate ...the KIDSCREEN-27 scale for the usage among adolescents aged between 10 and 19 years old in Slovenia.
A cross-sectional study was conducted among 2852 adolescents in primary and secondary school from November 2019 to January 2020 in Slovenia. 6-steps method of validation was used to test psychometric properties of the KIDSCREEN-27 scale. We checked descriptive statistics, performed a Mokken scale analysis, parametric item response theory, factor analysis, classical test theory and total (sub)scale scores.
All five subscales of the KIDSCREEN-27 formed a unidimensional scale with good homogeneity and reliability. The confirmatory factor analysis showed poor fit in user model versus baseline model metrics (CFI = 0.847; TLI = 0.862) and good fit in root mean square error (RMSEA = 0.072; p(χ
) < 0.001). A scale reliability was calculated using Cronbach's α (0.93), beta (0.86), G6 (0.95) and omega (0.93).
The questionnaire showed average psychometric properties and can be used among adolescents in Slovenia to find out about their quality of life. Further research is needed to explore why fit in user model metrics is poor.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Diabetic foot is a prevalent chronic complication of diabetes and increases the risk of lower limb amputation, leading to both an economic and a major societal problem. By detecting the risk of ...developing diabetic foot sufficiently early, it can be prevented or at least postponed. Using artificial intelligence, delayed diagnosis can be prevented, leading to more intensive preventive treatment of patients. Based on a systematic literature review, we analyzed 14 articles that included the use of artificial intelligence to predict the risk of developing diabetic foot. The articles were highly heterogeneous in terms of data use and showed varying degrees of sensitivity, specificity, and accuracy. The most used machine learning techniques were support vector machine (SVM) (n = 6) and K-Nearest Neighbor (KNN) (n = 5). Future research is recommended on larger samples of participants using different techniques to determine the most effective one.
The increasing availability of data stored in electronic health records brings substantial opportunities for advancing patient care and population health. This is, however, fundamentally dependant on ...the completeness and quality of data in these electronic health records. We sought to use electronic health record data to populate a risk prediction model for identifying patients with undiagnosed type 2 diabetes mellitus. We, however, found substantial (up to 90%) amounts of missing data in some healthcare centres. Attempts at imputing for these missing data or using reduced dataset by removing incomplete records resulted in a major deterioration in the performance of the prediction model. This case study illustrates the substantial wasted opportunities resulting from incomplete records by simulation of missing and incomplete records in predictive modelling process. Government and professional bodies need to prioritise efforts to address these data shortcomings in order to ensure that electronic health record data are maximally exploited for patient and population benefit.