BACKGROUND:There is an increasing demand for electronic health record (EHR)–based risk stratification and predictive modeling tools at the population level. This trend is partly due to increased ...value-based payment policies and the increasing availability of EHRs at the provider level. Risk stratification models, however, have been traditionally derived from claims or encounter systems. This study evaluates the challenges and opportunities of using EHR data instead of or in addition to administrative claims for risk stratification.
METHODS:This study used the structured EHR records and administrative claims of 85,581 patients receiving outpatient care at a large integrated provider system. Common data elements for risk stratification (ie, age, sex, diagnosis, and medication) were extracted from outpatient EHR records and administrative claims. The performance of a validated risk-stratification model was assessed using data extracted from claims alone, EHR alone, and claims and EHR combined.
RESULTS:EHR-derived metrics overlapped considerably with administrative claims (eg, number of chronic conditions). The accuracy of the model, when using EHR data alone, was acceptable with an area under the curve of ∼0.81 for hospitalization and ∼0.85 for identifying top 1% utilizers using the concurrent model. However, when using EHR data alone, the predictive model explained a lower amount of variation in utilization-based outcomes compared with administrative claims.
DISCUSSION:The results show a promising performance of models predicting cost and hospitalization using outpatient EHR’s diagnosis and medication data. More research is needed to evaluate the benefits of other EHR data types (eg, lab values and vital signs) for risk stratification.
Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well‐defined sense are dependent. In ...contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M‐dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with ‘ground truth’.
This study aimed to determine whether surface-based morphometry of preoperative whole-brain three-dimensional T1-weighted magnetic resonance imaging (MRI) images can predict the clinical outcomes of ...cochlear implantation.
This was an observational, multicenter study using preoperative MRI data.
The study was conducted at tertiary care referral centers.
Sixty-four patients with severe to profound hearing loss (≥70 dB bilaterally), who were scheduled for cochlear implant (CI) surgery, were enrolled. The patients included 19 with congenital hearing loss and 45 with acquired hearing loss.
Participants underwent CI surgery. Before surgery, high-resolution three-dimensional T1-weighted brain MRI was performed, and the images were analyzed using FreeSurfer.
The primary outcome was monosyllable audibility under quiet conditions 6 months after surgery. Cortical thickness residuals within 34 regions of interest (ROIs) as per the Desikan-Killiany cortical atlas were calculated based on age and healthy-hearing control regression lines.
Rank logistic regression analysis detected significant associations between CI effectiveness and five right hemisphere ROIs and five left hemisphere ROIs. Predictive modeling using the cortical thickness of the right entorhinal cortex and left medial orbitofrontal cortex revealed a significant correlation with speech discrimination ability. This correlation was higher in patients with acquired hearing loss than in those with congenital hearing loss.
Preoperative surface-based morphometry could potentially predict CI outcomes and assist in patient selection and clinical decision making. However, further research with larger, more diverse samples is necessary to confirm these findings and determine their generalizability.
Unexpected ICU readmission is associated with longer length of stay and increased mortality. To prevent ICU readmission and death after ICU discharge, our team of intensivists and data scientists ...aimed to use AmsterdamUMCdb to develop an explainable machine learning-based real-time bedside decision support tool.
Data from patients admitted to a mixed surgical-medical academic medical center ICU from 2004 to 2016.
Data from 2016 to 2019 from the same center.
Patient characteristics, clinical observations, physiologic measurements, laboratory studies, and treatment data were considered as model features. Different supervised learning algorithms were trained to predict ICU readmission and/or death, both within 7 days from ICU discharge, using 10-fold cross-validation. Feature importance was determined using SHapley Additive exPlanations, and readmission probability-time curves were constructed to identify subgroups. Explainability was established by presenting individualized risk trends and feature importance.
Our final derivation dataset included 14,105 admissions. The combined readmission/mortality rate within 7 days of ICU discharge was 5.3%. Using Gradient Boosting, the model achieved an area under the receiver operating characteristic curve of 0.78 (95% CI, 0.75-0.81) and an area under the precision-recall curve of 0.19 on the validation cohort (
= 3,929). The most predictive features included common physiologic parameters but also less apparent variables like nutritional support. At a 6% risk threshold, the model showed a sensitivity (recall) of 0.72, specificity of 0.70, and a positive predictive value (precision) of 0.15. Impact analysis using probability-time curves and the 6% risk threshold identified specific patient groups at risk and the potential of a change in discharge management to reduce relative risk by 14%.
We developed an explainable machine learning model that may aid in identifying patients at high risk for readmission and mortality after ICU discharge using the first freely available European critical care database, AmsterdamUMCdb. Impact analysis showed that a relative risk reduction of 14% could be achievable, which might have significant impact on patients and society. ICU data sharing facilitates collaboration between intensivists and data scientists to accelerate model development.
•This research investigates customer comeback without win-back offer.•First-lifetime behavior has a concave relationship with customer comeback.•However, first-lifetime behavior does not help in ...predicting customer comeback.•Social media data are important determinants and predictors of customer comeback.•Low-profile customers spend less upon their return to the company.
Customer comeback, or the return of previous customers to the company without receiving a win-back offer, has received little academic attention. By tapping into a rich transactional database enhanced with social media data, we argue that a multitude of touch points after defection (such as social media) can accurately inform managers about customers for whom win-back offers may not be relevant. Econometric analysis reveals positive links of Facebook likes and event attendances after defection with customer comeback, next to a significant concave relationship of first-lifetime behavior. From a predictive point of view, touch points after defection are more informative than first-lifetime behavior. Finally, comeback customers spend, on average, more than newly acquired customers, and lower-profile comeback customers reduce their spending with the firm upon return. Based on our multimethod analysis, we demonstrate the value of comeback analyses and derive several actionable insights and recommendations for both theory and practice.
Spatial modelling techniques are increasingly used in species distribution modelling. However, the implemented techniques differ in their modelling performance, and some consensus methods are needed ...to reduce the uncertainty of predictions. In this study, we tested the predictive accuracies of five consensus methods, namely Weighted Average (WA), Mean(All), Median(All), Median(PCA), and Best, for 28 threatened plant species. North-eastern Finland, Europe. The spatial distributions of the plant species were forecasted using eight state-of-the-art single-modelling techniques providing an ensemble of predictions. The probability values of occurrence were then combined using five consensus algorithms. The predictive accuracies of the single-model and consensus methods were assessed by computing the area under the curve (AUC) of the receiver-operating characteristic plot. The mean AUC values varied between 0.697 (classification tree analysis) and 0.813 (random forest) for the single-models, and from 0.757 to 0.850 for the consensus methods. WA and Mean(All) consensus methods provided significantly more robust predictions than all the single-models and the other consensus methods. Consensus methods based on average function algorithms may increase significantly the accuracy of species distribution forecasts, and thus they show considerable promise for different conservation biological and biogeographical applications.
Asthma is one of the most prevalent and costly chronic conditions in the United States, which cannot be cured. However, accurate and timely surveillance data could allow for timely and targeted ...interventions at the community or individual level. Current national asthma disease surveillance systems can have data availability lags of up to two weeks. Rapid progress has been made in gathering nontraditional, digital information to perform disease surveillance. We introduce a novel method of using multiple data sources for predicting the number of asthma-related emergency department (ED) visits in a specific area. Twitter data, Google search interests, and environmental sensor data were collected for this purpose. Our preliminary findings show that our model can predict the number of asthma ED visits based on near-real-time environmental and social media data with approximately 70% precision. The results can be helpful for public health surveillance, ED preparedness, and targeted patient interventions.
Advances in observational, laboratory, and modeling techniques open the way to the development of physical models of the seismic cycle with potentially predictive power. To explore that possibility, ...we developed an integrative and fully dynamic model of the Parkfield segment of the San Andreas Fault. The model succeeds in reproducing a realistic earthquake sequence of irregular moment magnitude (M w ) 6.0 main shocks—including events similar to the ones in 1966 and 2004—and provides an excellent match for the detailed interseismic, coseismic, and postseismic observations collected along this fault during the most recent earthquake cycle. Such calibrated physical models provide new ways to assess seismic hazards and forecast seismicity response to perturbations of natural or anthropogenic origins.
Automated prediction of students' retention and graduation in education using advanced analytical methods such as artificial intelligence (AI), has recently attracted the attention of educators, both ...in theory and in practice. Whereas invaluable insights and theories for measuring and testing the topic have been proposed, most of the existing methods do not technically highlight the non-trivial factors behind the renowned challenges and attrition. To this effect, by making use of two categories of data collected in a higher education setting about students (i) retention (n = 52262) and (ii) graduation (n = 53639); this study proposes a machine learning model - RG-DMML (retention and graduation data mining and machine learning) and ensemble algorithm for prediction of students' retention and graduation status in education. This was done by training and testing key features that are technically deemed suitable for measuring the constructs (retention and graduation), such as (i) the Average grade of the previous high school, and (ii) the Entry/admission score. The proposed model (RG-DMML) is designed based on the cross industry standard process for data mining (CRISP-DM) methodology, implemented using supervised machine learning technique such as K-Nearest Neighbor (KNN), and validated using the k-fold cross-validation method. The results show that the executed model and algorithm based on the Bagging method and 10-fold cross-validation are efficient and effective for predicting the student's retention and graduation status, with Precision (retention = 0.909, graduation = 0.822), Recall (retention = 1.000, graduation = 0.957), Accuracy (retention = 0.909, graduation = 0.817), F1-Score (retention = 0.952, graduation = 0.885) showing significant high accuracy levels or performance rate, and low Error-rate (retention = 0.090, graduation = 0.182), respectively. In addition, by considering the individual features selected through the Wrapper method in predicting the outputs, the proposed model proved more effective for predicting the students' retention status in comparison to the graduation data. The implications of the models' output and factors that impact the effective prediction or identification of at-risk students, e.g., for timely intervention, counselling, decision-making, and sustainable educational practice are empirically discussed in the study.
•A machine learning model (RG-DMML) was proposed to predict students' retention and graduation status in education.•Key features technically proved suitable for prediction of students' retention and graduation status are described.•Ensemble algorithm for implementing the ML model with a high level of performance and classification accuracy is defined.•Results of the ML models' output are empirically discussed considering its implications for sustainable educational practice.
Machine learning (ML) prediction models in healthcare and pharmacy-related research face challenges with encoding high-dimensional Healthcare Coding Systems (HCSs) such as ICD, ATC, and DRG codes, ...given the trade-off between reducing model dimensionality and minimizing information loss.
To investigate using Network Analysis modularity as a method to group HCSs to improve encoding in ML models.
The MIMIC-III dataset was utilized to create a multimorbidity network in which ICD-9 codes are the nodes and the edges are the number of patients sharing the same ICD-9 code pairs. A modularity detection algorithm was applied using different resolution thresholds to generate 6 sets of modules. The impact of four grouping strategies on the performance of predicting 90-day Intensive Care Unit readmissions was assessed. The grouping strategies compared: 1) binary encoding of codes, 2) encoding codes grouped by network modules, 3) grouping codes to the highest level of ICD-9 hierarchy, and 4) grouping using the single-level Clinical Classification Software (CCS). The same methodology was also applied to encode DRG codes but limiting the comparison to a single modularity threshold to binary encoding.
The performance was assessed using Logistic Regression, Support Vector Machine with a non-linear kernel, and Gradient Boosting Machines algorithms. Accuracy, Precision, Recall, AUC, and F1-score with 95% confidence intervals were reported.
Models utilized modularity encoding outperformed ungrouped codes binary encoding models. The accuracy improved across all algorithms ranging from 0.736 to 0.78 for the modularity encoding, to 0.727 to 0.779 for binary encoding. AUC, recall, and precision also improved across almost all algorithms. In comparison with other grouping approaches, modularity encoding generally showed slightly higher performance in AUC, ranging from 0.813 to 0.837, and precision, ranging from 0.752 to 0.782.
Modularity encoding enhances the performance of ML models in pharmacy research by effectively reducing dimensionality and retaining necessary information. Across the three algorithms used, models utilizing modularity encoding showed superior or comparable performance to other encoding approaches. Modularity encoding introduces other advantages such as it can be used for both hierarchical and non-hierarchical HCSs, the approach is clinically relevant, and can enhance ML models' clinical interpretation. A Python package has been developed to facilitate the use of the approach for future research.
•The paper introduces Modularity Encoding to encode categorical Healthcare Coding Systems in machine learning models.•The approach enhances the clinical interpretation of models by representing how codes co-occur in a individuals.•Modularity encoding showed better or similar performance to other popular encoding approaches.•The approach can be used for hierarchical and non-hierarchical systems.•The study features a developed Python package to simplify applying modularity encoding in future studies.