•Commonly studied scenario considers only binary cancer vs. no cancer classification.•Our system classifies whole slide breast biopsies into five diagnostic categories.•Pipeline of fully ...convolutional networks localizes diagnostically relevant regions.•Convolutional neural network classifies detected regions of interest in whole slides.•Experiments show that our method is compatible with predictions of 45 pathologists.
Generalizability of algorithms for binary cancer vs. no cancer classification is unknown for clinically more significant multi-class scenarios where intermediate categories have different risk factors and treatment strategies. We present a system that classifies whole slide images (WSI) of breast biopsies into five diagnostic categories. First, a saliency detector that uses a pipeline of four fully convolutional networks, trained with samples from records of pathologists’ screenings, performs multi-scale localization of diagnostically relevant regions of interest in WSI. Then, a convolutional network, trained from consensus-derived reference samples, classifies image patches as non-proliferative or proliferative changes, atypical ductal hyperplasia, ductal carcinoma in situ, and invasive carcinoma. Finally, the saliency and classification maps are fused for pixel-wise labeling and slide-level categorization. Experiments using 240 WSI showed that both saliency detector and classifier networks performed better than competing algorithms, and the five-class slide-level accuracy of 55% was not statistically different from the predictions of 45 pathologists. We also present example visualizations of the learned representations for breast cancer diagnosis.
Digital pathology has entered a new era with the availability of whole slide scanners that create the high-resolution images of full biopsy slides. Consequently, the uncertainty regarding the ...correspondence between the image areas and the diagnostic labels assigned by pathologists at the slide level, and the need for identifying regions that belong to multiple classes with different clinical significances have emerged as two new challenges. However, generalizability of the state-of-the-art algorithms, whose accuracies were reported on carefully selected regions of interest (ROIs) for the binary benign versus cancer classification, to these multi-class learning and localization problems is currently unknown. This paper presents our potential solutions to these challenges by exploiting the viewing records of pathologists and their slide-level annotations in weakly supervised learning scenarios. First, we extract candidate ROIs from the logs of pathologists' image screenings based on different behaviors, such as zooming, panning, and fixation. Then, we model each slide with a bag of instances represented by the candidate ROIs and a set of class labels extracted from the pathology forms. Finally, we use four different multi-instance multi-label learning algorithms for both slide-level and ROI-level predictions of diagnostic categories in whole slide breast histopathology images. Slide-level evaluation using 5-class and 14-class settings showed average precision values up to 81% and 69%, respectively, under different weakly labeled learning scenarios. ROI-level predictions showed that the classifier could successfully perform multi-class localization and classification within whole slide images that were selected to include the full range of challenging diagnostic categories.
Adaptive gain theory proposes that the dynamic shifts between exploration and exploitation control states are modulated by the locus coeruleus-norepinephrine system and reflected in tonic and phasic ...pupil diameter. This study tested predictions of this theory in the context of a societally important visual search task: the review and interpretation of digital whole slide images of breast biopsies by physicians (pathologists). As these medical images are searched, pathologists encounter difficult visual features and intermittently zoom in to examine features of interest. We propose that tonic and phasic pupil diameter changes during image review may correspond to perceived difficulty and dynamic shifts between exploration and exploitation control states. To examine this possibility, we monitored visual search behavior and tonic and phasic pupil diameter while pathologists (N = 89) interpreted 14 digital images of breast biopsy tissue (1,246 total images reviewed). After viewing the images, pathologists provided a diagnosis and rated the level of difficulty of the image. Analyses of tonic pupil diameter examined whether pupil dilation was associated with pathologists' difficulty ratings, diagnostic accuracy, and experience level. To examine phasic pupil diameter, we parsed continuous visual search data into discrete zoom-in and zoom-out events, including shifts from low to high magnification (e.g., 1× to 10×) and the reverse. Analyses examined whether zoom-in and zoom-out events were associated with phasic pupil diameter change. Results demonstrated that tonic pupil diameter was associated with image difficulty ratings and zoom level, and phasic pupil diameter showed constriction upon zoom-in events, and dilation immediately preceding a zoom-out event. Results are interpreted in the context of adaptive gain theory, information gain theory, and the monitoring and assessment of physicians' diagnostic interpretive processes.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Objective To quantify the accuracy and reproducibility of pathologists’ diagnoses of melanocytic skin lesions.Design Observer accuracy and reproducibility study.Setting 10 US states.Participants Skin ...biopsy cases (n=240), grouped into sets of 36 or 48. Pathologists from 10 US states were randomized to independently interpret the same set on two occasions (phases 1 and 2), at least eight months apart.Main outcome measures Pathologists’ interpretations were condensed into five classes: I (eg, nevus or mild atypia); II (eg, moderate atypia); III (eg, severe atypia or melanoma in situ); IV (eg, pathologic stage T1a (pT1a) early invasive melanoma); and V (eg, ≥pT1b invasive melanoma). Reproducibility was assessed by intraobserver and interobserver concordance rates, and accuracy by concordance with three reference diagnoses.Results In phase 1, 187 pathologists completed 8976 independent case interpretations resulting in an average of 10 (SD 4) different diagnostic terms applied to each case. Among pathologists interpreting the same cases in both phases, when pathologists diagnosed a case as class I or class V during phase 1, they gave the same diagnosis in phase 2 for the majority of cases (class I 76.7%; class V 82.6%). However, the intraobserver reproducibility was lower for cases interpreted as class II (35.2%), class III (59.5%), and class IV (63.2%). Average interobserver concordance rates were lower, but with similar trends. Accuracy using a consensus diagnosis of experienced pathologists as reference varied by class: I, 92% (95% confidence interval 90% to 94%); II, 25% (22% to 28%); III, 40% (37% to 44%); IV, 43% (39% to 46%); and V, 72% (69% to 75%). It is estimated that at a population level, 82.8% (81.0% to 84.5%) of melanocytic skin biopsy diagnoses would have their diagnosis verified if reviewed by a consensus reference panel of experienced pathologists, with 8.0% (6.2% to 9.9%) of cases overinterpreted by the initial pathologist and 9.2% (8.8% to 9.6%) underinterpreted.Conclusion Diagnoses spanning moderately dysplastic nevi to early stage invasive melanoma were neither reproducible nor accurate in this large study of pathologists in the USA. Efforts to improve clinical practice should include using a standardized classification system, acknowledging uncertainty in pathology reports, and developing tools such as molecular markers to support pathologists’ visual assessments.
IMPORTANCE: A breast pathology diagnosis provides the basis for clinical treatment and management decisions; however, its accuracy is inadequately understood. OBJECTIVES: To quantify the magnitude of ...diagnostic disagreement among pathologists compared with a consensus panel reference diagnosis and to evaluate associated patient and pathologist characteristics. DESIGN, SETTING, AND PARTICIPANTS: Study of pathologists who interpret breast biopsies in clinical practices in 8 US states. EXPOSURES: Participants independently interpreted slides between November 2011 and May 2014 from test sets of 60 breast biopsies (240 total cases, 1 slide per case), including 23 cases of invasive breast cancer, 73 ductal carcinoma in situ (DCIS), 72 with atypical hyperplasia (atypia), and 72 benign cases without atypia. Participants were blinded to the interpretations of other study pathologists and consensus panel members. Among the 3 consensus panel members, unanimous agreement of their independent diagnoses was 75%, and concordance with the consensus-derived reference diagnoses was 90.3%. MAIN OUTCOMES AND MEASURES: The proportions of diagnoses overinterpreted and underinterpreted relative to the consensus-derived reference diagnoses were assessed. RESULTS: Sixty-five percent of invited, responding pathologists were eligible and consented to participate. Of these, 91% (N = 115) completed the study, providing 6900 individual case diagnoses. Compared with the consensus-derived reference diagnosis, the overall concordance rate of diagnostic interpretations of participating pathologists was 75.3% (95% CI, 73.4%-77.0%; 5194 of 6900 interpretations). Among invasive carcinoma cases (663 interpretations), 96% (95% CI, 94%-97%) were concordant, and 4% (95% CI, 3%-6%) were underinterpreted; among DCIS cases (2097 interpretations), 84% (95% CI, 82%-86%) were concordant, 3% (95% CI, 2%-4%) were overinterpreted, and 13% (95% CI, 12%-15%) were underinterpreted; among atypia cases (2070 interpretations), 48% (95% CI, 44%-52%) were concordant, 17% (95% CI, 15%-21%) were overinterpreted, and 35% (95% CI, 31%-39%) were underinterpreted; and among benign cases without atypia (2070 interpretations), 87% (95% CI, 85%-89%) were concordant and 13% (95% CI, 11%-15%) were overinterpreted. Disagreement with the reference diagnosis was statistically significantly higher among biopsies from women with higher (n = 122) vs lower (n = 118) breast density on prior mammograms (overall concordance rate, 73% 95% CI, 71%-75% for higher vs 77% 95% CI, 75%-80% for lower, P < .001), and among pathologists who interpreted lower weekly case volumes (P < .001) or worked in smaller practices (P = .034) or nonacademic settings (P = .007). CONCLUSIONS AND RELEVANCE: In this study of pathologists, in which diagnostic interpretation was based on a single breast biopsy slide, overall agreement between the individual pathologists’ interpretations and the expert consensus–derived reference diagnoses was 75.3%, with the highest level of concordance for invasive carcinoma and lower levels of concordance for DCIS and atypia. Further research is needed to understand the relationship of these findings with patient management.
A pilot study examined the extent to which eye movements occurring during interpretation of digitized breast biopsy whole slide images (WSI) can distinguish novice interpreters from experts, ...informing assessments of competency progression during training and across the physician-learning continuum. A pathologist with fellowship training in breast pathology interpreted digital WSI of breast tissue and marked the region of highest diagnostic relevance (dROI). These same images were then evaluated using computer vision techniques to identify visually salient regions of interest (vROI) without diagnostic relevance. A non-invasive eye tracking system recorded pathologists' (N = 7) visual behavior during image interpretation, and we measured differential viewing of vROIs versus dROIs according to their level of expertise. Pathologists with relatively low expertise in interpreting breast pathology were more likely to fixate on, and subsequently return to, diagnostically irrelevant vROIs relative to experts. Repeatedly fixating on the distracting vROI showed limited value in predicting diagnostic failure. These preliminary results suggest that eye movements occurring during digital slide interpretation can characterize expertise development by demonstrating differential attraction to diagnostically relevant versus visually distracting image regions. These results carry both theoretical implications and potential for monitoring and evaluating student progress and providing automated feedback and scanning guidance in educational settings.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Toward More Equitable Breast Cancer Outcomes Elmore, Joann G; Lee, Christoph I
JAMA : the journal of the American Medical Association,
06/2024, Letnik:
331, Številka:
22
Journal Article
Recenzirano
In its revised Recommendation Statement,1 the US Preventive Services Task Force (USPSTF) now recommends that all women undergo routine breast cancer screening every other year beginning at age 40 ...years. This is an adjustment from the 2016 recommendation for all women to start at age 50 and for women aged 40 to 49 to engage in individualized decision-making and part of an overarching aim to increase earlier detection of breast cancer and address inequalities in breast cancer mortality, especially among Black women. Additionally, the task force, in acknowledgment of evolving technology, updated the recommended primary screening modalities to include digital breast tomosynthesis (3D mammography). They noted that digital breast tomosynthesis improves the benefit-to-risk ratio compared with digital mammography, primarily by decreasing false-positive results, a well-known screening-related harm.
Inspecting digital imaging for primary diagnosis introduces perceptual and cognitive demands for physicians tasked with interpreting visual medical information and arriving at appropriate diagnoses ...and treatment decisions. The process of medical interpretation and diagnosis involves a complex interplay between visual perception and multiple cognitive processes, including memory retrieval, problem-solving, and decision-making. Eye-tracking technologies are becoming increasingly available in the consumer and research markets and provide novel opportunities to learn more about the interpretive process, including differences between novices and experts, how heuristics and biases shape visual perception and decision-making, and the mechanisms underlying misinterpretation and misdiagnosis. The present review provides an overview of eye-tracking technology, the perceptual and cognitive processes involved in medical interpretation, how eye tracking has been employed to understand medical interpretation and promote medical education and training, and some of the promises and challenges for future applications of this technology.