e13567
Background: Patients (pts) with cancer in the outpatient setting are at a high-risk for adverse events, such as unplanned hospitalizations and ER visits. A recent study found that up to 30% of ...hospital admissions were preventable. Identifying pts at risk of avoidable clinical deterioration remains a challenge, as clinicians may not be aware of pts’ experiences at home. The growing use of health IT presents an opportunity to identify and respond to clinical deterioration in patients before an adverse event occurs. In this study, we describe a human-centered design (HCD) process used to develop a clinical deterioration risk prediction system to improve the detection of and response to deterioration in cancer outpatients. Methods: Predictive model: We enrolled eligible cancer pts and collected data from each one including: FitBit, geolocation, EHR, and weekly patient-reported outcome measures (PROMs). Pts and their family caregivers could also report non-routine events (NREs), defined as any deviation from expected optimal care. We also captured unplanned treatment events (UTEs), a clinically meaningful change in the pt’s treatment course or care pathway. We developed a predictive model that generates a pt’s 7-day risk of clinical deterioration. Response system: We are developing a risk communication system (RCS) to communicate predicted risk scores to clinical teams. Using a HCD process, we first conducted 36 observations across 100 patient encounters to understand the environment of use. Next, we conducted 18 clinician interviews to define user needs. We have conducted 7 multi-disciplinary design sessions to iteratively develop prototypes of the RCS. We are currently conducting formative usability testing to assess the prototype and gather clinician feedback. Results: Predictive model: We have enrolled 36 cancer outpatients (24 head & neck, 9 gastrointestinal, and 3 lung). Pts completed a total of 219 weekly PROM surveys, reported 107 NREs and experienced 18 UTEs (e.g., infection). So far, models using EHR and PROM data are the most sensitive and precise (AUC: 0.983; 0.999). More patient data are required to develop higher quality stable models. Response system: We identified key design elements to include in the RCS, such as the caregiver’s phone number and the pt’s weight over time. Preliminary findings demonstrate high usability of the prototype RCS. Oncologists identified opportunities for the system to better support team communication and coordination, and to improve the identification and response to clinical deterioration in cancer outpatients. Conclusions: We have developed and tested a clinical deterioration risk prediction system for cancer outpatients. Future studies will implement the response system and evaluate its impact on clinical care.
Abstract
Purpose
To analyze the clinical completeness, correctness, usefulness, and safety of chatbot and medication database responses to everyday inpatient medication-use questions.
Methods
We ...evaluated the responses from an artificial intelligence chatbot, a medication database, and clinical pharmacists to 200 real-world medication-use questions. Answer quality was rated by a blinded group of pharmacists, providers, and nurses. Chatbot and medication database responses were deemed “acceptable” if the mean reviewer rating was within 3 points of the mean rating for pharmacists’ answers. We used descriptive statistics for reviewer ratings and Kendall’s coefficient to evaluate interrater agreement.
Results
The medication database generated responses to 194 (97%) questions, with 88% considered acceptable for clinical correctness, 76% considered acceptable for completeness, 83% considered acceptable for safety, and 81% considered acceptable for usefulness compared to pharmacists’ answers. The chatbot responded to only 160 (80%) questions, with 85% considered acceptable for clinical correctness, 65% considered acceptable for completeness, 71% considered acceptable for safety, and 68% considered acceptable for usefulness.
Conclusion
Traditional search methods using a drug database provide more clinically correct, complete, safe, and useful answers than a chatbot. When the chatbot generated a response, the clinical correctness was similar to that of a drug database; however, it was not rated as favorably for clinical completeness, safety, or usefulness. Our results highlight the need for ongoing training and continued improvements to artificial intelligence chatbots for them to be incorporated reliably into the clinical workflow. With continued improvement in chatbot functionality, chatbots could be a useful pharmacist adjunct, providing healthcare providers with quick and reliable answers to medication-use questions.
Objectives
All residency programs in the United States are required to report their residents' progress on the milestones to the Accreditation Council for Graduate Medical Education (ACGME) ...biannually. Since the development and institution of this competency‐based assessment framework, residency programs have been attempting to ascertain the best ways to assess resident performance on these metrics. Simulation was recommended by the ACGME as one method of assessment for many of the milestone subcompetencies. We developed three simulation scenarios with scenario‐specific milestone‐based assessment tools. We aimed to gather validity evidence for this tool.
Methods
We conducted a prospective observational study to investigate the validity evidence for three mannequin‐based simulation scenarios for assessing individual residents on emergency medicine (EM) milestones. The subcompetencies (i.e., patient care PC1, PC2, PC3) included were identified via a modified Delphi technique using a group of experienced EM simulationists. The scenario‐specific checklist (CL) items were designed based on the individual milestone items within each EM subcompetency chosen for assessment and reviewed by experienced EM simulationists. Two independent live raters who were EM faculty at the respective study sites scored each scenario following brief rater training. The inter‐rater reliability (IRR) of the assessment tool was determined by measuring intraclass correlation coefficient (ICC) for the sum of the CL items as well as the global rating scales (GRSs) for each scenario. Comparing GRS and CL scores between various postgraduate year (PGY) levels was performed with analysis of variance.
Results
Eight subcompetencies were chosen to assess with three simulation cases, using 118 subjects. Evidence of test content, internal structure, response process, and relations with other variables were found. The ICCs for the sum of the CL items and the GRSs were >0.8 for all cases, with one exception (clinical management GRS = 0.74 in sepsis case). The sum of CL items and GRSs (p < 0.05) discriminated between PGY levels on all cases. However, when the specific CL items were mapped back to milestones in various proficiency levels, the milestones in the higher proficiency levels (level 3 L3 and 4 L4) did not often discriminate between various PGY levels. L3 milestone items discriminated between PGY levels on five of 12 occasions they were assessed, and L4 items discriminated only two of 12 times they were assessed.
Conclusion
Three simulation cases with scenario‐specific assessment tools allowed evaluation of EM residents on proficiency L1 to L4 within eight of the EM milestone subcompetencies. Evidence of test content, internal structure, response process, and relations with other variables were found. Good to excellent IRR and the ability to discriminate between various PGY levels was found for both the sum of CL items and the GRSs. However, there was a lack of a positive relationship between advancing PGY level and the completion of higher‐level milestone items (L3 and L4).
The prevalence of artificial intelligence (AI) is rapidly growing across industries including in health care. AI has the potential to improve patient safety (e.g., diagnostic error) and reduce ...clinician workload (e.g., documentation burden) and healthcare costs. Yet, many questions remain about how clinicians will interact with and use AI to support their work and how these technologies will impact clinician workflow, decision-making, and teamwork. It is also uncertain how patients will interact with AI, with a recent report suggesting 60 percent of US adults are uncomfortable with their health care providers using AI. In this panel, we will discuss AI applications across differing health care contexts and describe how AI influences clinician (and patient) workflows. We will outline considerations for the design and implementation of AI-based technologies in health care and needed areas of future research.
Abstract
Objective
The Vanderbilt Children’s Hospital launched an innovative Technology-Based Patient and Family Engagement Consult Service in 2014. This paper describes our initial experience with ...this service, characterizes health-related needs of families of hospitalized children, and details the technologies recommended to promote engagement and meet needs.
Materials and Methods
We retrospectively reviewed consult service documentation for patient characteristics, health-related needs, and consultation team recommendations. Needs were categorized using a consumer health needs taxonomy. Recommendations were classified by technology type.
Results
Twenty-two consultations were conducted with families of patients ranging in age from newborn to 15 years, most with new diagnoses or chronic illnesses. The consultation team identified 99 health-related needs (4.5 per consultation) and made 166 recommendations (7.5 per consultation, 1.7 per need). Need categories included 38 informational needs, 26 medical needs, 23 logistical needs, and 12 social needs. The most common recommendations were websites (50, 30%) and mobile applications (30, 18%). The most frequent recommendations by need category were websites for informational needs (39, 50%), mobile applications for medical needs (15, 40%), patient portals for logistical needs (12, 44%), and disease-specific support groups for social needs (19, 56%).
Discussion
Families of hospitalized pediatric patients have a variety of health-related needs, many of which could be addressed by technology recommendations from an engagement consult service.
Conclusion
This service is the first of its kind, offering a potentially generalizable and scalable approach to assessing health-related needs, meeting them with technologies, and promoting patient and family engagement in the inpatient setting.
We conducted a pilot trial of a new mobile and web-based intervention to improve diabetes adherence. The text messaging system was designed to motivate and remind adolescents about diabetes self-care ...tasks. Text messages were tailored according to individually-reported barriers to diabetes self-care. A total of 23 adolescents with type 1 diabetes used the system for a period of three months. On average, they received 10 text messages per week (range 8-12). A matched historical control group from the same clinic was used for comparison. After three months, system users rated the content, usability and experiences with the system, which were very favourable. Comparison of the intervention and control groups indicated a significant interaction between group and time. Both groups had similar HbA(1c) levels at baseline. After three months, the mean HbA(1c) level in the intervention group was unchanged (8.8%), but the mean level in the control group was significantly higher (9.9%), P = 0.006. The results demonstrate the feasibility of the messaging system, user acceptance and a promising effect on glycaemic control. Integrating this type of messaging system with online educational programming could prove to be beneficial.
Abstract only
e13560
Background: A common cause of preventable harm is the failure to detect and appropriately respond to clinical deterioration. Timely intervention is needed, particularly in cancer ...patients, to mitigate the effects of adverse events, disease progression, and medical error. This problem requires effective clinical surveillance, early recognition, timely notification of the appropriate clinician, and effective intervention. Methods: Applying a user-centered systems engineering design approach, we designed and implemented a surveillance-and-response system to improve the detection and response to clinical deterioration in cancer outpatients. The surveillance system predicts 7-day risk of UTEs, defined as clinically meaningful changes in the patient’s treatment course or cancer care pathway (e.g., any unplanned/unexpected: clinic or ER visit, hospital admission, or major treatment change and/or delays, and/or death). Data inputs consist of: 1) patient activity and health data collected by a Fitbit monitor; 2) geolocation data to measure activity outside the home (i.e., locations preselected at study onset); 3) clinical data from the hospital’s electronic health record; and 4) patient-reported outcomes measures (i.e., PROMs; the NCCN Distress Thermometer, the Comprehensive OpeN-Ended Survey or CONES, Global Health Score, items from the Consumer Assessment of Healthcare Providers and Systems (CAHPS)). Herein, we measured the effectiveness of Fitbit data alone to UTEs in a pilot sample of patients. Dimension reduction of Fitbit variables was first carried out by using Pearson correlation analysis to eliminate redundant variables. As UTEs are rare events, they were oversampled using the Synthetic Minority Oversampling Technique (SMOTE) to balance the dataset. A random forest classification model was trained to predict 7-day UTE risk. Model accuracy was determined by calculating the mean of Stratified 5-Fold Cross-Validation with 10 repeats. Results: Fitbit data was collected over a 6-8-week period from 14 head and neck cancer patients receiving surgical resection, outpatient chemotherapy, and/or radiotherapy. We identified six UTEs in 5 patients. A random forest classification model was developed from 10 variables derived from 7 Fitbit measures. The following variables were averaged or summed daily: average heart rate (HR), resting HR, below 50% or zone 1 of maximum HR, zone 2 and zone 3 HR combined (i.e., 70-100% of max HR), total daily calories, steps, and sleep in minutes. We achieved a model accuracy of 94% (ROC AUC: 0.984, Precision-Recall AUC: 0.985). Conclusions: Activity and health data collected by a commercial activity monitor demonstrated effectiveness in predicting patient UTEs when an oversampling procedure was used to adjust for class imbalance (i.e., low UTE rate). Future studies are recommended to verify and validate this result in a larger patient sample.
Introduction
Optimal teaching and assessment methods and models for emergency airway, breathing, and hemorrhage interventions are not currently known. The University of Minnesota Combat Casualty ...Training consortium (UMN CCTC) was formed to explore the strengths and weaknesses of synthetic training models (STMs) versus live tissue (LT) models. In this study, we compare the effectiveness of best in class STMs versus an anesthetized caprine (goat) model for training and assessing seven procedures: junctional hemorrhage control, tourniquet (TQ) placement, chest seal, needle thoracostomy (NCD), nasopharyngeal airway (NPA), tube thoracostomy, and cricothyrotomy (Cric).
Methods
Army combat medics were randomized to one of four groups: 1) LT trained–LT tested (LT‐LT), 2) LT trained–STM tested (LT‐STM), 3) STM trained–LT tested (STM‐LT), and 4) STM trained–STM tested (STM‐STM). Participants trained in small groups for 3 to 4 hours and were evaluated individually. LT‐LT was the “control” to which other groups were compared, as this is the current military predeployment standard. The mean procedural scores (PSs) were compared using a pairwise t‐test with a Dunnett's correction. Logistic regression was used to compare critical fails (CFs) and skipped tasks.
Results
There were 559 subjects included. Junctional hemorrhage control revealed no difference in CFs, but LT‐tested subjects (LT‐LT and STM‐LT) skipped this task more than STM‐tested subjects (LT‐STM and STM‐STM; p < 0.05), and STM‐STM had higher PSs than LT‐LT (p < 0.001). For TQ, both STM‐tested groups (LT‐STM and STM‐STM) had more CFs than LT‐LT (p < 0.001) and LT‐STM had lower PSs than LT‐LT (p < 0.05). No differences were seen for chest seal. For NCD, LT‐STM had more CFs than LT‐LT (p = 0.001) and lower PSs (p = 0.001). There was no difference in CFs for NPA, but all groups had worse PSs versus LT‐LT (p < 0.05). For Cric, we were underpowered; STM‐LT trended toward more CFs (p = 0.08), and STM‐STM had higher PSs than LT‐LT (p < 0.01). Tube thoracostomy revealed that STM‐LT had higher CFs than LT‐LT (p < 0.05), but LT‐STM had lower PSs (p < 0.05). An interaction effect (making the subjects who trained and tested on different models more likely to CF) was only found for TQ, chest seal, and Cric; however, of these three procedures, only TQ demonstrated any significant difference in CF rates.
Conclusion
Training on STM or LT did not demonstrate a difference in subsequent performance for five of seven procedures (junctional hemorrhage, TQ, chest seal, NPA, and NCD). Until STMs are developed with improved anthropomorphic and tissue fidelity, there may still be a role for LT for training tube thoracostomy and potentially Cric. For assessment, our STM appears more challenging for TQ and potentially for NCD than LT. For junctional hemorrhage, the increased “skips” with LT may be explained by the differences in anatomic fidelity. While these results begin to uncover the effects of training and assessing these procedures on various models, further study is needed to ascertain how well performance on an STM or LT model translates to the human model.
We reviewed the available literature on measuring human performance to evaluate human-system interfaces (HSIs), focused on high-fidelity simulations of industrial process control systems, to identify ...best practices and future directions for research and operations. We searched the available literature and then conducted in-depth review, structured coding, and analysis of 49 articles, which described 42 studies. Human performance measures were classified across six dimensions: task performance, workload, situation awareness, teamwork/collaboration, plant performance, and other cognitive performance indicators. Many studies measured performance in more than one dimension, but few studies addressed more than three dimensions. Only a few measures demonstrated acceptable levels of reliability, validity, and sensitivity in the reviewed studies in this research domain. More research is required to assess the measurement qualities of the commonly used measures. The results can provide guidance to direct future research and practice for human performance measurement in process control HSI design and deployment.
•Human performance measures in process control HSI evaluation were reviewed.•The reviewed measures were categorized into six human performance dimensions.•Many of the measures were not adequately evaluated in the reviewed domain.•We provided recommendations on current practices and future directions.