While collective intelligence (CI) is a powerful approach to increase decision accuracy, few attempts have been made to unlock its potential in medical decision-making. Here we investigated the ...performance of three well-known collective intelligence rules ("majority", "quorum", and "weighted quorum") when applied to mammography screening. For any particular mammogram, these rules aggregate the independent assessments of multiple radiologists into a single decision (recall the patient for additional workup or not). We found that, compared to single radiologists, any of these CI-rules both increases true positives (i.e., recalls of patients with cancer) and decreases false positives (i.e., recalls of patients without cancer), thereby overcoming one of the fundamental limitations to decision accuracy that individual radiologists face. Importantly, we find that all CI-rules systematically outperform even the best-performing individual radiologist in the respective group. Our findings demonstrate that CI can be employed to improve mammography screening; similarly, CI may have the potential to improve medical decision-making in a much wider range of contexts, including many areas of diagnostic imaging and, more generally, diagnostic decisions that are based on the subjective interpretation of evidence.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Objective To quantify the accuracy and reproducibility of pathologists’ diagnoses of melanocytic skin lesions.Design Observer accuracy and reproducibility study.Setting 10 US states.Participants Skin ...biopsy cases (n=240), grouped into sets of 36 or 48. Pathologists from 10 US states were randomized to independently interpret the same set on two occasions (phases 1 and 2), at least eight months apart.Main outcome measures Pathologists’ interpretations were condensed into five classes: I (eg, nevus or mild atypia); II (eg, moderate atypia); III (eg, severe atypia or melanoma in situ); IV (eg, pathologic stage T1a (pT1a) early invasive melanoma); and V (eg, ≥pT1b invasive melanoma). Reproducibility was assessed by intraobserver and interobserver concordance rates, and accuracy by concordance with three reference diagnoses.Results In phase 1, 187 pathologists completed 8976 independent case interpretations resulting in an average of 10 (SD 4) different diagnostic terms applied to each case. Among pathologists interpreting the same cases in both phases, when pathologists diagnosed a case as class I or class V during phase 1, they gave the same diagnosis in phase 2 for the majority of cases (class I 76.7%; class V 82.6%). However, the intraobserver reproducibility was lower for cases interpreted as class II (35.2%), class III (59.5%), and class IV (63.2%). Average interobserver concordance rates were lower, but with similar trends. Accuracy using a consensus diagnosis of experienced pathologists as reference varied by class: I, 92% (95% confidence interval 90% to 94%); II, 25% (22% to 28%); III, 40% (37% to 44%); IV, 43% (39% to 46%); and V, 72% (69% to 75%). It is estimated that at a population level, 82.8% (81.0% to 84.5%) of melanocytic skin biopsy diagnoses would have their diagnosis verified if reviewed by a consensus reference panel of experienced pathologists, with 8.0% (6.2% to 9.9%) of cases overinterpreted by the initial pathologist and 9.2% (8.8% to 9.6%) underinterpreted.Conclusion Diagnoses spanning moderately dysplastic nevi to early stage invasive melanoma were neither reproducible nor accurate in this large study of pathologists in the USA. Efforts to improve clinical practice should include using a standardized classification system, acknowledging uncertainty in pathology reports, and developing tools such as molecular markers to support pathologists’ visual assessments.
Full text
Available for:
BFBNIB, CMK, NMLJ, NUK, PNG, SAZU, UL, UM, UPUK
IMPORTANCE: A breast pathology diagnosis provides the basis for clinical treatment and management decisions; however, its accuracy is inadequately understood. OBJECTIVES: To quantify the magnitude of ...diagnostic disagreement among pathologists compared with a consensus panel reference diagnosis and to evaluate associated patient and pathologist characteristics. DESIGN, SETTING, AND PARTICIPANTS: Study of pathologists who interpret breast biopsies in clinical practices in 8 US states. EXPOSURES: Participants independently interpreted slides between November 2011 and May 2014 from test sets of 60 breast biopsies (240 total cases, 1 slide per case), including 23 cases of invasive breast cancer, 73 ductal carcinoma in situ (DCIS), 72 with atypical hyperplasia (atypia), and 72 benign cases without atypia. Participants were blinded to the interpretations of other study pathologists and consensus panel members. Among the 3 consensus panel members, unanimous agreement of their independent diagnoses was 75%, and concordance with the consensus-derived reference diagnoses was 90.3%. MAIN OUTCOMES AND MEASURES: The proportions of diagnoses overinterpreted and underinterpreted relative to the consensus-derived reference diagnoses were assessed. RESULTS: Sixty-five percent of invited, responding pathologists were eligible and consented to participate. Of these, 91% (N = 115) completed the study, providing 6900 individual case diagnoses. Compared with the consensus-derived reference diagnosis, the overall concordance rate of diagnostic interpretations of participating pathologists was 75.3% (95% CI, 73.4%-77.0%; 5194 of 6900 interpretations). Among invasive carcinoma cases (663 interpretations), 96% (95% CI, 94%-97%) were concordant, and 4% (95% CI, 3%-6%) were underinterpreted; among DCIS cases (2097 interpretations), 84% (95% CI, 82%-86%) were concordant, 3% (95% CI, 2%-4%) were overinterpreted, and 13% (95% CI, 12%-15%) were underinterpreted; among atypia cases (2070 interpretations), 48% (95% CI, 44%-52%) were concordant, 17% (95% CI, 15%-21%) were overinterpreted, and 35% (95% CI, 31%-39%) were underinterpreted; and among benign cases without atypia (2070 interpretations), 87% (95% CI, 85%-89%) were concordant and 13% (95% CI, 11%-15%) were overinterpreted. Disagreement with the reference diagnosis was statistically significantly higher among biopsies from women with higher (n = 122) vs lower (n = 118) breast density on prior mammograms (overall concordance rate, 73% 95% CI, 71%-75% for higher vs 77% 95% CI, 75%-80% for lower, P < .001), and among pathologists who interpreted lower weekly case volumes (P < .001) or worked in smaller practices (P = .034) or nonacademic settings (P = .007). CONCLUSIONS AND RELEVANCE: In this study of pathologists, in which diagnostic interpretation was based on a single breast biopsy slide, overall agreement between the individual pathologists’ interpretations and the expert consensus–derived reference diagnoses was 75.3%, with the highest level of concordance for invasive carcinoma and lower levels of concordance for DCIS and atypia. Further research is needed to understand the relationship of these findings with patient management.
A pilot study examined the extent to which eye movements occurring during interpretation of digitized breast biopsy whole slide images (WSI) can distinguish novice interpreters from experts, ...informing assessments of competency progression during training and across the physician-learning continuum. A pathologist with fellowship training in breast pathology interpreted digital WSI of breast tissue and marked the region of highest diagnostic relevance (dROI). These same images were then evaluated using computer vision techniques to identify visually salient regions of interest (vROI) without diagnostic relevance. A non-invasive eye tracking system recorded pathologists' (N = 7) visual behavior during image interpretation, and we measured differential viewing of vROIs versus dROIs according to their level of expertise. Pathologists with relatively low expertise in interpreting breast pathology were more likely to fixate on, and subsequently return to, diagnostically irrelevant vROIs relative to experts. Repeatedly fixating on the distracting vROI showed limited value in predicting diagnostic failure. These preliminary results suggest that eye movements occurring during digital slide interpretation can characterize expertise development by demonstrating differential attraction to diagnostically relevant versus visually distracting image regions. These results carry both theoretical implications and potential for monitoring and evaluating student progress and providing automated feedback and scanning guidance in educational settings.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Collective intelligence refers to the ability of groups to outperform individual decision makers when solving complex cognitive problems. Despite its potential to revolutionize decision making in a ...wide range of domains, including medical, economic, and political decision making, at present, little is known about the conditions underlying collective intelligence in real-world contexts. We here focus on two key areas of medical diagnostics, breast and skin cancer detection. Using a simulation study that draws on large real-world datasets, involving more than 140 doctors making more than 20,000 diagnoses, we investigate when combining the independent judgments of multiple doctors outperforms the best doctor in a group. We find that similarity in diagnostic accuracy is a key condition for collective intelligence: Aggregating the independent judgments of doctors outperforms the best doctor in a group whenever the diagnostic accuracy of doctors is relatively similar, but not when doctors’ diagnostic accuracy differs too much. This intriguingly simple result is highly robust and holds across different group sizes, performance levels of the best doctor, and collective intelligence rules. The enabling role of similarity, in turn, is explained by its systematic effects on the number of correct and incorrect decisions of the best doctor that are overruled by the collective. By identifying a key factor underlying collective intelligence in two important real-world contexts, our findings pave the way for innovative and more effective approaches to complex real-world decision making, and to the scientific analyses of those approaches.
Full text
Available for:
BFBNIB, NMLJ, NUK, PNG, SAZU, UL, UM, UPUK
In Reply to Khatri and Samsonov Kraakevik, Jeff A; Carney, Patricia A
Academic medicine,
2024-May-01, 2024-5-00, 20240501, Volume:
99, Issue:
5
Journal Article
Team-based learning (TBL) is an active learning strategy gaining traction in medical education. However, studies demonstrating successful incorporation into Graduate Medical Education (GME) curricula ...are limited.BACKGROUNDTeam-based learning (TBL) is an active learning strategy gaining traction in medical education. However, studies demonstrating successful incorporation into Graduate Medical Education (GME) curricula are limited.To assess the feasibility, acceptability and efficacy of Infectious Disease (ID) TBL sessions within an Internal Medicine (IM) residency curriculum as part of a traditional 60-minute conference.OBJECTIVETo assess the feasibility, acceptability and efficacy of Infectious Disease (ID) TBL sessions within an Internal Medicine (IM) residency curriculum as part of a traditional 60-minute conference.We conducted a prospective cohort study of TBL implementation assessing acceptability and feasibility (Phase 1), and efficacy (Phase 2).DESIGNWe conducted a prospective cohort study of TBL implementation assessing acceptability and feasibility (Phase 1), and efficacy (Phase 2).Phase 1 included 101 IM residents and eight TBL naïve faculty. Phase 2 included aggregate cohort IM In-Training Exam (ITE) data before (2008-2013) and after (2014-2019) TBL implementation.PARTICIPANTSPhase 1 included 101 IM residents and eight TBL naïve faculty. Phase 2 included aggregate cohort IM In-Training Exam (ITE) data before (2008-2013) and after (2014-2019) TBL implementation.Eight TBL sessions were delivered once or twice weekly during 60-minute noon conferences.INTERVENTIONSEight TBL sessions were delivered once or twice weekly during 60-minute noon conferences.We assessed feasibility by measuring individual Readiness Assurance Test (iRAT) completion rates and inclusion of TBL elements in each session; acceptability through attendance, perceived effectiveness rating and attitudes about TBL; efficacy by comparing ITE data for the overall ID content and specific TBL associated learning objectives.MAIN MEASURESWe assessed feasibility by measuring individual Readiness Assurance Test (iRAT) completion rates and inclusion of TBL elements in each session; acceptability through attendance, perceived effectiveness rating and attitudes about TBL; efficacy by comparing ITE data for the overall ID content and specific TBL associated learning objectives.Seventy-five of 93 (80%) residents attended at least one session. All TBL elements were successfully incorporated each session. Of those surveyed, 86% rated the TBL sessions as facilitating their learning "very (4)" or "extremely (5)" well on a 5-point Likert scale (p<0.001). ITE mean percent correct scores of total ID content as well as TBL associated learning objective performance were both significantly higher for the post-TBL cohort among PGY-2 (76.2 vs 62.3; 76.2 vs 62.6) and PGY-3 (73 vs 64.5; 76.2 vs 64.5) IM residents (p<0.05; p<0.001 respectively).KEY RESULTSSeventy-five of 93 (80%) residents attended at least one session. All TBL elements were successfully incorporated each session. Of those surveyed, 86% rated the TBL sessions as facilitating their learning "very (4)" or "extremely (5)" well on a 5-point Likert scale (p<0.001). ITE mean percent correct scores of total ID content as well as TBL associated learning objective performance were both significantly higher for the post-TBL cohort among PGY-2 (76.2 vs 62.3; 76.2 vs 62.6) and PGY-3 (73 vs 64.5; 76.2 vs 64.5) IM residents (p<0.05; p<0.001 respectively).Implementing a complete TBL pedagogy within the traditional noontime conference hour in GME is feasible, acceptable to residents and faculty, and associated with improved learning efficacy demonstrated through improved ITE scores.CONCLUSIONSImplementing a complete TBL pedagogy within the traditional noontime conference hour in GME is feasible, acceptable to residents and faculty, and associated with improved learning efficacy demonstrated through improved ITE scores.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OBVAL, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Medical trainees experience significant exam-related stress, such as preparing for the USA Licensing Medical Examination Step 1, which often negatively affects emotional health. Nourish, a novel Step ...1 support program, was designed to foster improved self-efficacy and well-being during the process of studying for and taking the exam. Nourish was piloted at Oregon Health & Science University between December 2018 and February 2019.
Program elements were guided by Self-Efficacy Theory and included community building, wellness support, peer tutoring and social persuasion. Program evaluation included pre- and post-program surveys. Participation was optional and included 46 of 154 students (30%) with 40 of the 46 students (87%) completing pre and post evaluations. The pre-survey was given during the Nourish orientation in December prior to the Step 1 study period, and the post-survey was given in early February when most students had taken their exam but none had received their scores.
While summary self-efficacy scores increased between baseline and post program (24.9 vs 27.7, p < 0.001), summary emotional health scores worsened (8.15 vs 8.75, p = 0.03). Summary scores for physical health also dropped but this difference was not statistically significant. Summary perceived stress scores increased from 15.5 at baseline to 23.7 post program (p < 0.001). All students who routinely participated in Nourish passed their USMLE Step 1 exam. One student who participated only in the orientation session did not pass.
Nourish appeared to improve self-efficacy, even though students reported being stressed with low emotional health. The program appeared to help students align task demands with their own personal resources and set reasonable expectations and strategies to pass the exam. Medical schools should consider similar peer- and faculty mentor-based wellness and tutoring programs to support medical students while they work to achieve academic success.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Background: Medical student wellness, including physical health, emotional health, and levels of perceived stress, appears to decline during training, with students reporting high levels of ...depression, anxiety, and burnout as early as the first year of medical school. The impact of curricular changes on health and stress remains unclear, and a modified curriculum that compresses training of the foundational sciences and its effect on wellness has not been studied. Oregon Health & Science University School of Medicine has recently instituted a unique competency-based model, which provides an important opportunity to assess the effects of curricular change on student wellness.
Objective: Assess the effects of curricular change on student wellness.
Design: Medical students at a single institution were administered the SF-8, an 8-item health-related quality of life survey, as well as the Perceived Stress Scale, a 10-item scale that measures the degree to which life situations are appraised as stressful, at baseline (matriculation) and at the end of Year 1, 2 and 3. Individual variables were assessed over time, as well as a trend analysis of summary domain scores over the 4 time periods.
Results: Physical, emotional, and overall health were highest at baseline and lowest at the end of Year 1, after which they improved but never again reached baseline levels. Physical health declined less than emotional health. Perceived stress levels did not change over time but remained moderately high. There were no differences in health or perceived stress based on demographic variables.
Conclusions: In a competency-based curriculum, physical, emotional and overall health significantly worsened during Year 1 but improved thereafter, while perceived stress remained unchanged. Early in training, stress and poor overall health may be related to concerns about self-efficacy and workload. Although advanced students show improved wellness, concerns remained about emotional difficulties, such as anxiety and irritability, and feeling a lack of control.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK