UNI-MB - logo
UMNIK - logo
 
E-resources
Peer reviewed Open access
  • Evaluating Machine-Learning...
    Lopez, Derrick; Lu, Juan; Sanfilippo, Frank; Briffa, Tom; Hung, Joe; Nedkoff, Lee

    International journal of population data science, 12/2020, Volume: 5, Issue: 5
    Journal Article

    IntroductionHospital administrative data is a valuable source to measure myocardial infarction (MI) rates. However, admission counts are susceptible to over-inflation if the patient is transferred multiple times during a single episode of care, and variables denoting transfers may not be reliable. To obtain an accurate number of events, hospital transfers need to be correctly identified. Objectives and ApproachWe assessed multivariable logistic regression and various machine-learning models to predict transfers in hospital administrative data. Using Western Australian linked hospital data, we identified records from 2000-2016 with a principal discharge diagnosis of MI. Our standard method to compare against was a 24-hour look-back to identify a transfer using just admission and separation dates from the current and previous records for the same patient. Multivariable logistic regression and decision trees with various boosting algorithms were used to predict if a single record was a transfer, using variables recorded in the admission (e.g. age, sex, type of hospital, admitted from, emergency/elective admission). The performance of each model was calculated using metrics including area under the curve (AUC). ResultsRecords in the training, validation and testing samples had similar characteristics: mean age=68.9 years, 66% were male and 58% admitted to tertiary hospitals. Gradient Boosting Decision Tree (AUC=0.887, 95%CI: 0.886-0.887) outperformed multivariable logistic regression (AUC=0.875; 95% CI: 0.869-0.881) and random forest models (AUC=0.859; 95% CI: 0.853-0.865). Conclusion / ImplicationsMultivariable logistic regression and machine-learning models are able to identify transfers in a single record from existing variables. They can be used in unlinked hospital administrative data where records belonging to the same patient cannot be identified.