Abstract
In this study, we aimed to identify the factors that were associated with mortality among continuing care residents in Alberta, during the coronavirus disease 2019 (COVID-19) pandemic. We ...achieved this by leveraging and linking various administrative datasets together. Then, we examined pre-processing methods in terms of prediction performance. Finally, we developed several machine learning models and compared the results of these models in terms of performance. We conducted a retrospective cohort study of all continuing care residents in Alberta, Canada, from March 1, 2020, to March 31, 2021. We used a univariable and a multivariable logistic regression (LR) model to identify predictive factors of 60-day all-cause mortality by estimating odds ratios (ORs) with a 95% confidence interval. To determine the best sensitivity–specificity cut-off point, the Youden index was employed. We developed several machine learning models to determine the best model regarding performance. In this cohort study, increased age, male sex, symptoms, previous admissions, and some specific comorbidities were associated with increased mortality. Machine learning and pre-processing approaches offer a potentially valuable method for improving risk prediction for mortality, but more work is needed to show improvement beyond standard risk factors.
Understanding the epidemiology of Coronavirus Disease of 2019 (COVID-19) in a local context is valuable for both future pandemic preparedness and potential increases in COVID-19 case volume, ...particularly due to variant strains.
Our work allowed us to complete a population-based study on patients who tested positive for COVID-19 in Alberta from March 1, 2020 to December 15, 2021. We completed a multi-centre, retrospective population-based descriptive study using secondary data sources in Alberta, Canada. We identified all adult patients (≥ 18 years of age) tested and subsequently positive for COVID-19 (including only the first incident case of COVID-19) on a laboratory test. We determined positive COVID-19 tests, gender, age, comorbidities, residency in a long-term care (LTC) facility, time to hospitalization, length of stay (LOS) in hospital, and mortality. Patients were followed for 60 days from a COVID-19 positive test.
Between March 1, 2020 and December 15, 2021, 255,037 adults were identified with COVID-19 in Alberta. Most confirmed cases occurred among those less than 60 years of age (84.3%); however, most deaths (89.3%) occurred among those older than 60 years. Overall hospitalization rate among those who tested positive was 5.9%. Being a resident of LTC was associated with substantial mortality of 24.6% within 60 days of a positive COVID-19 test. The most common comorbidity among those with COVID-19 was depression. Across all patients 17.3% of males and 18.6% of females had an unplanned ambulatory visit subsequent to their positive COVID-19 test.
COVID-19 is associated with extensive healthcare utilization. Residents of LTC were substantially impacted during the COVID-19 pandemic with high associated mortality. Further work should be done to better understand the economic burden associated with related healthcare utilization following a COVID-19 infection to inform healthcare system resource allocation, planning, and forecasting.
ObjectiveTo evaluate the validity of COVID-19 International Classification of Diseases, 10th Revision (ICD-10) codes and their combinations.DesignRetrospective cohort study.SettingAcute care ...hospitals and emergency departments (EDs) in Alberta, Canada.ParticipantsPatients who were admitted to hospital or presented to an ED in Alberta, as captured by local administrative databases between 1 March 2020 and 28 February 2021, who had a positive COVID-19 test and/or a COVID-19-related ICD-10 code.Main outcome measuresThe sensitivity, positive predictive value (PPV) and 95% CIs for ICD-10 codes were computed. Stratified analysis on age group, sex, symptomatic status, mechanical ventilation, hospital type, patient intensive care unit (ICU) admission, discharge status and season of pandemic were conducted.ResultsTwo overlapping subsets of the study population were considered: those who had a positive COVID-19 test (cohort A, for estimating sensitivity) and those who had a COVID-19-related ICD-10 code (cohort B, for estimating PPV). Cohort A included 17 979 ED patients and 6477 inpatients while cohort B included 33 675 ED patients and 18 746 inpatients. Of inpatients, 9.5% in cohort A and 8.1% in cohort B received mechanical ventilation. Over 13% of inpatients were admitted to ICU. The length of hospital stay was 6 days (IQR: 3–14) for cohort A and 8 days (IQR: 3–19) for cohort B. In-hospital mortality was 15.9% and 38.8% for cohort A and B, respectively. The sensitivity for ICD-10 code U07.1 (COVID-19, virus identified) was 82.5% (81.8%–83.2%) with a PPV of 93.1% (92.6%–93.6%). The combination of U07.1 and U07.3 (multisystem inflammatory syndrome associated with COVID-19) had a sensitivity of 82.5% (81.9%–83.2%) and PPV of 92.9% (92.4%–93.4%).ConclusionsIn Alberta, ICD-10 COVID-19 codes (U07.1 and U07.3) were coded well with high validity. This indicates administrative data can be used for COVID-19 research and pandemic management purposes.
The COVID-19 pandemic affected access to care, and the associated public health measures influenced the transmission of other infectious diseases. The pandemic has dramatically changed antibiotic ...prescribing in the community. We aimed to determine the impact of the COVID-19 pandemic and the resulting control measures on oral antibiotic prescribing in long-term care facilities (LTCFs) in Alberta and Ontario, Canada using linked administrative data. Antibiotic prescription data were collected for LTCF residents 65 years and older in Alberta and Ontario from 1 January 2017 until 31 December 2020. Weekly prescription rates per 1000 residents, stratified by age, sex, antibiotic class, and selected individual agents, were calculated. Interrupted time series analyses using SARIMA models were performed to test for changes in antibiotic prescription rates after the start of the pandemic (1 March 2020). The average annual cohort size was 18,489 for Alberta and 96,614 for Ontario. A significant decrease in overall weekly prescription rates after the start of the pandemic compared to pre-pandemic was found in Alberta, but not in Ontario. Furthermore, a significant decrease in prescription rates was observed for antibiotics mainly used to treat respiratory tract infections: amoxicillin in both provinces (Alberta: −0.6 per 1000 LTCF residents decrease in weekly prescription rate, p = 0.006; Ontario: −0.8, p < 0.001); and doxycycline (−0.2, p = 0.005) and penicillin (−0.04, p = 0.014) in Ontario. In Ontario, azithromycin was prescribed at a significantly higher rate after the start of the pandemic (0.7 per 1000 LTCF residents increase in weekly prescription rate, p = 0.011). A decrease in prescription rates for antibiotics that are largely used to treat respiratory tract infections is in keeping with the lower observed rates for respiratory infections resulting from pandemic control measures. The results should be considered in the contexts of different LTCF systems and provincial public health responses to the pandemic.
Data quality assessment presents a challenge for research using coded administrative health data. The objective of this study is to develop and validate a set of coding association rules for coded ...diagnostic data.
We used the Canadian re-abstracted hospital discharge abstract data coded in International Classification of Disease, 10th revision (ICD-10) codes. Association rule mining was conducted on the re-abstracted data in four age groups (0-4, 20-44, 45-64; ≥ 65) to extract ICD-10 coding association rules at the three-digit (category of diagnosis) and four-digit levels (category of diagnosis with etiology, anatomy, or severity). The rules were reviewed by a panel of 5 physicians and 2 classification specialists using a modified Delphi rating process. We proposed and defined the variance and bias to assess data quality using the rules.
After the rule mining process and the panel review, 388 rules at the three-digit level and 275 rules at the four-digit level were developed. Half of the rules were from the age group of ≥65. Rules captured meaningful age-specific clinical associations, with rules at the age group of ≥65 being more complex and comprehensive than other age groups. The variance and bias can identify rules with high bias and variance in Alberta data and provides directions for quality improvement.
A set of ICD-10 data quality rules were developed and validated by a clinical and classification expert panel. The rules can be used as a tool to assess ICD-coded data, enabling the monitoring and comparison of data quality across institutions, provinces, and countries.
Surveillance and outcome studies for heart failure (HF) require accurate identification of patients with HF. Algorithms based on International Classification of Diseases (ICD) codes to identify HF ...from administrative data are inadequate owing to their relatively low sensitivity. Detailed clinical information from electronic medical records (EMRs) is potentially useful for improving ICD algorithms. This study aimed to enhance the ICD algorithm for HF definition by incorporating comprehensive information from EMRs.
The study included 2106 inpatients in Calgary, Alberta, Canada. Medical chart review was used as the reference gold standard for evaluating developed algorithms. The commonly used ICD codes for defining HF were used (namely, the ICD algorithm). The performance of different algorithms using the free text discharge summaries from a population-based EMR were compared with the ICD algorithm. These algorithms included a keyword search algorithm looking for HF-specific terms, a machine learning–based HF concept (HFC) algorithm, an EMR structured data based algorithm, and combined algorithms (the ICD and HFC combined algorithm).
Of 2106 patients, 296 (14.1%) were patients with HF as determined by chart review. The ICD algorithm had 92.4% positive predictive value (PPV) but low sensitivity (57.4%). The EMR keyword search algorithm achieved a higher sensitivity (65.5%) than the ICD algorithm, but with a lower PPV (77.6%). The HFC algorithm achieved a better sensitivity (80.0%) and maintained a reasonable PPV (88.9%) compared with the ICD algorithm and the keyword algorithm. An even higher sensitivity (83.3%) was reached by combining the HFC and ICD algorithms, with a lower PPV (83.3%). The structured EMR data algorithm reached a sensitivity of 78% and a PPV of 54.2%. The combined EMR structured data and ICD algorithm had a higher sensitivity (82.4%), but the PPV remained low at 54.8%. All algorithms had a specificity ranging from 87.5% to 99.2%.
Applying natural language processing and machine learning on the discharge summaries of inpatient EMR data can improve the capture of cases of HF compared with the widely used ICD algorithm. The utility of the HFC algorithm is straightforward, making it easily applied for HF case identification.
Case identification is important for health services research, measuring health system performance and risk adjustment, but existing methods based on manual chart review or diagnosis codes can be ...expensive, time consuming or of limited validity. We aimed to develop a hypertension case definition in electronic medical records (EMRs) for inpatient clinical notes using machine learning.
A cohort of patients 18 years of age or older who were discharged from 1 of 3 Calgary acute care facilities (1 academic hospital and 2 community hospitals) between Jan. 1 and June 30, 2015, were randomly selected, and we compared the performance of EMR phenotype algorithms developed using machine learning with an algorithm based on the Canadian version of the
,
(ICD), in identifying patients with hypertension. Hypertension status was determined by chart review, the machine-learning algorithms used EMR notes and the ICD algorithm used the Discharge Abstract Database (Canadian Institute for Health Information).
Of our study sample (
= 3040), 1475 (48.5%) patients had hypertension. The group with hypertension was older (median age of 71.0 yr v. 52.5 yr for those patients without hypertension) and had fewer females (710 48.2% v. 764 52.3%). Our final EMR-based models had higher sensitivity than the ICD algorithm (> 90% v. 47%), while maintaining high positive predictive values (> 90% v. 97%).
We found that hypertension tends to have clear documentation in EMRs and is well classified by concept search on free text. Machine learning can provide insights into how and where conditions are documented in EMRs and suggest nonmachine-learning phenotypes to implement.
ObjectivesPatient feedback is critical to identify and resolve patient safety and experience issues in healthcare systems. However, large volumes of unstructured text data can pose problems for ...manual (human) analysis. This study reports the results of using a semiautomated, computational topic-modelling approach to analyse a corpus of patient feedback.MethodsPatient concerns were received by Alberta Health Services between 2011 and 2018 (n=76 163), regarding 806 care facilities in 163 municipalities, including hospitals, clinics, community care centres and retirement homes, in a province of 4.4 million. Their existing framework requires manual labelling of pre-defined categories. We applied an automated latent Dirichlet allocation (LDA)-based topic modelling algorithm to identify the topics present in these concerns, and thereby produce a framework-free categorisation.ResultsThe LDA model produced 40 topics which, following manual interpretation by researchers, were reduced to 28 coherent topics. The most frequent topics identified were communication issues causing delays (frequency: 10.58%), community care for elderly patients (8.82%), interactions with nurses (8.80%) and emergency department care (7.52%). Many patient concerns were categorised into multiple topics. Some were more specific versions of categories from the existing framework (eg, communication issues causing delays), while others were novel (eg, smoking in inappropriate settings).DiscussionLDA-generated topics were more nuanced than the manually labelled categories. For example, LDA found that concerns with community care were related to concerns about nursing for seniors, providing opportunities for insight and action.ConclusionOur findings outline the range of concerns patients share in a large health system and demonstrate the usefulness of using LDA to identify categories of patient concerns.
Electronic health records (EHRs), originally designed to facilitate health care delivery, are becoming a valuable data source for health research. EHR systems have two components: the front end, ...where the data is entered by healthcare workers including physicians and nurses, and the back-end electronic data warehouse where the data is stored in a relational database. EHR data elements can be of many types, which can be categorized as structured, unstructured free-text, and imaging data. The Sunrise Clinical Manager (SCM) EHR is one example of an inpatient EHR system, which covers the city of Calgary (Alberta, Canada). This system, under the management of Alberta Health Services, is now being explored for research use. The purpose of the present paper is to describe the SCM EHR for research purposes, showing how this generalizes to EHRs in general. We further discuss advantages, challenges (e.g. potential bias and data quality issues), and analytical capacities and requirements associated with using EHRs.
A nonstabilized azomethine ylide reacts with a wide range of substituted isatoic anhydrides to afford novel 1,3-benzodiazepin-5-one derivatives, which are generally isolated in high yield. The ...transformations involve 1,3-dipolar cycloaddition reactions of the ylide with the anhydrides to give transient, and in a representative case spectroscopically observable, oxazolidine intermediates that undergo ring-opening−decarboxylation−ring-closing reaction cascades to yield the 1,3-benzodiazepin-5-one products.