•In retrospective studies new AI algorithms approach performance of single readers.•Design, origin, and definition of ‘true positive’ will all alter performance.•Prospective studies need to ...demonstrate performance can be replicated in real life.•To further reduce mortality AI algorithms must find more high-risk cancers earlier.
Breast cancer screening with mammography reduces mortality in the women who attend by detecting high risk cancer early. It is far from perfect with variations in both sensitivity for the detection of cancer and very wide variations in specificity, leading to unnecessary recalls and biopsies.
Over the last 12 months several papers have reported on AI algorithms that perform as well as human readers on large well curated population data sets. The nature of the test sets, the way the gold standard has been calculated, the definition of a positive call, and the statistics used all influence the results. Historically retrospective studies have not predicted the real-life performance of radiologist plus machine. So, it is important to perform prospective studies before introducing Artificial intelligence into real world breast screening.
To compare the diagnostic accuracy of two-dimensional (2D) full-field digital mammography with that of two-view (mediolateral and craniocaudal) and single-view (mediolateral oblique) tomosynthesis in ...an observer study involving two institutions.
Ethical committee approval was obtained. All participating women gave informed consent. Two hundred twenty women (mean age, 56.3; range, 40-80 years) with breast density of 2-4 according to American College of Radiology criteria were recruited between November 2008 and September 2009 and underwent standard treatment plus tomosynthesis with a prototype photon-counting machine. After exclusion criteria were met, this resulted in a final test set of 130 women. Ten accredited readers classified the 130 cases (40 cancers, 24 benign lesions, and 66 normal images) using 2D mammography and two-view tomosynthesis. Another 10 readers reviewed the same cases using 2D mammography but single-view tomosynthesis. The multireader, multicase receiver operating characteristic (ROC) method was applied. The significance of the observed difference in accuracy between 2D mammography and tomosynthesis was calculated.
For diagnostic accuracy, 2D mammography performed significantly worse than two-view tomosynthesis (average area under ROC curve AUC = 0.772 for 2D, AUC = 0.851 for tomosynthesis, P = .021). Significant differences were found for both masses and microcalcification (P = .037 and .049). The difference in AUC between the two modalities of -0.110 was significant (P = .03) only for the five readers with the least experience (<10 years of reading); with AUC of -0.047 for the five readers with 10 years or more experience (P = .25). No significant difference (P = .79) in reader performance was seen when 2D mammography (average AUC = 0.774) was compared with single-view tomosynthesis (average AUC = 0.775).
Two-view tomosynthesis outperforms 2D mammography but only for readers with the least experience. The benefits were seen for both masses and microcalcification. No differences in classification accuracy was seen between and 2D mammography and single-view tomosynthesis.
Artificial intelligence (AI) systems performing at radiologist-like levels in the evaluation of digital mammography (DM) would improve breast cancer screening accuracy and efficiency. We aimed to ...compare the stand-alone performance of an AI system to that of radiologists in detecting breast cancer in DM.
Nine multi-reader, multi-case study datasets previously used for different research purposes in seven countries were collected. Each dataset consisted of DM exams acquired with systems from four different vendors, multiple radiologists' assessments per exam, and ground truth verified by histopathological analysis or follow-up, yielding a total of 2652 exams (653 malignant) and interpretations by 101 radiologists (28 296 independent interpretations). An AI system analyzed these exams yielding a level of suspicion of cancer present between 1 and 10. The detection performance between the radiologists and the AI system was compared using a noninferiority null hypothesis at a margin of 0.05.
The performance of the AI system was statistically noninferior to that of the average of the 101 radiologists. The AI system had a 0.840 (95% confidence interval CI = 0.820 to 0.860) area under the ROC curve and the average of the radiologists was 0.814 (95% CI = 0.787 to 0.841) (difference 95% CI = -0.003 to 0.055). The AI system had an AUC higher than 61.4% of the radiologists.
The evaluated AI system achieved a cancer detection accuracy comparable to an average breast radiologist in this retrospective setting. Although promising, the performance and impact of such a system in a screening setting needs further investigation.
The natural history of ductal carcinoma in situ (DCIS) remains uncertain. The risk factors for the development of invasive cancer in unresected DCIS are unclear.
Women diagnosed with DCIS on needle ...biopsy after 1997 who did not undergo surgical resection for ≥1 year after diagnosis were identified by breast centres and the cancer registry and outcomes were reviewed.
Eighty-nine women with DCIS diagnosed 1998–2010 were identified. The median age at diagnosis was 75 (range 44–94) years with median follow-up (diagnosis to death, invasive disease or last review) of 59 (12–180) months. Twenty-nine women (33%) developed invasive breast cancer after a median interval of 45 (12–144) months. 14/29 (48%) with high grade, 10/31 (32%) with intermediate grade and 3/17 (18%) with low grade DCIS developed invasive cancer after median intervals of 38, 60 and 51 months. The cumulative incidence of invasion was significantly higher in high grade DCIS than other grades (p = .0016, log-rank test). Invasion was more frequent in lesions with calcification as the predominant feature (23/50 v. 5/25; p = .042) and in younger women (p = .0002). Endocrine therapy was associated with a lower rate of invasive breast cancer (p = .048).
High cytonuclear grade, mammographic microcalcification, young age and lack of endocrine therapy were risk factors for DCIS progression to invasive cancer. Surgical excision of high grade DCIS remains the treatment of choice. Given the uncertain long-term natural history of non-high grade DCIS, the option of active surveillance of women with this condition should be offered within a clinical trial.
Purpose
To study the feasibility of automatically identifying normal digital mammography (DM) exams with artificial intelligence (AI) to reduce the breast cancer screening reading workload.
Methods ...and materials
A total of 2652 DM exams (653 cancer) and interpretations by 101 radiologists were gathered from nine previously performed multi-reader multi-case receiver operating characteristic (MRMC ROC) studies. An AI system was used to obtain a score between 1 and 10 for each exam, representing the likelihood of cancer present. Using all AI scores between 1 and 9 as possible thresholds, the exams were divided into groups of low- and high likelihood of cancer present. It was assumed that, under the pre-selection scenario, only the high-likelihood group would be read by radiologists, while all low-likelihood exams would be reported as normal. The area under the reader-averaged ROC curve (AUC) was calculated for the original evaluations and for the pre-selection scenarios and compared using a non-inferiority hypothesis.
Results
Setting the low/high-likelihood threshold at an AI score of 5 (high likelihood > 5) results in a trade-off of approximately halving (− 47%) the workload to be read by radiologists while excluding 7% of true-positive exams. Using an AI score of 2 as threshold yields a workload reduction of 17% while only excluding 1% of true-positive exams. Pre-selection did not change the average AUC of radiologists (inferior 95% CI > − 0.05) for any threshold except at the extreme AI score of 9.
Conclusion
It is possible to automatically pre-select exams using AI to significantly reduce the breast cancer screening reading workload.
Key Points
• There is potential to use artificial intelligence to automatically reduce the breast cancer screening reading workload by excluding exams with a low likelihood of cancer.
• The exclusion of exams with the lowest likelihood of cancer in screening might not change radiologists’ breast cancer detection performance.
• When excluding exams with the lowest likelihood of cancer, the decrease in true-positive recalls would be balanced by a simultaneous reduction in false-positive recalls.
Purpose To investigate the effect of double readings by a second radiologist on recall rates, cancer detection, and characteristics of cancers detected in the National Health Service Breast Screening ...Program in England. Materials and Methods In this retrospective analysis, 805 206 women were evaluated through screening and diagnostic test results by extracting 1 year of routine data from 33 English breast screening centers. Centers used double reading of digital mammograms, with arbitration if there were discrepant interpretations. Information on reader decisions, with results of follow-up tests, were used to explore the effect of the second reader. The statistical tests used were the test for equality of proportions, the χ
test for independence, and the t test. Results The first reader recalled 4.76% of women (38 295 of 805 206 women; 95% confidence interval CI: 4.71%, 4.80%). Two readers recalled 6.19% of women in total (49 857 of 805 206 women; 95% CI: 6.14%, 6.24%), but arbitration of discordant readings reduced the recall rate to 4.08% (32 863 of 805 206 women; 95% CI: 4.04%, 4.12%; P < .001). A total of 7055 cancers were detected, of which 627 (8.89%; 95% CI: 8.22%, 9.55%; P < .001) were detected by the second reader only. These additional cancers were more likely to be ductal carcinoma in situ (30.5% 183 of 600 vs 22.0% 1344 of 6114; P < .001), and additional invasive cancers were smaller (mean size, 14.2 vs 16.7 mm; P < .001), had fewer involved nodes, and were likely to be lower grade. Conclusion Double reading with arbitration reduces recall and increases cancer detection compared with single reading. Cancers detected only by the second reader were smaller, of lower grade, and had less nodal involvement.
RSNA, 2018.
Knowledge of x-ray attenuation is essential for developing and evaluating x-ray imaging technologies. In mammography, measurement of breast density, dose estimation, and differentiation between cysts ...and solid tumours are example applications requiring accurate data on tissue attenuation. Published attenuation data are, however, sparse and cover a relatively wide range. To supplement available data we have previously measured the attenuation of cyst fluid and solid lesions using photon-counting spectral mammography. The present study aims to measure the attenuation of normal adipose and glandular tissue, and to measure the effect of formalin fixation, a major uncertainty in published data. A total of 27 tumour specimens, seven fibro-glandular tissue specimens, and 15 adipose tissue specimens were included. Spectral (energy-resolved) images of the samples were acquired and the image signal was mapped to equivalent thicknesses of two known reference materials, from which x-ray attenuation as a function of energy can be derived. The spread in attenuation between samples was relatively large, partly because of natural variation. The variation of malignant and glandular tissue was similar, whereas that of adipose tissue was lower. Formalin fixation slightly altered the attenuation of malignant and glandular tissue, whereas the attenuation of adipose tissue was not significantly affected. The difference in attenuation between fresh tumour tissue and cyst fluid was smaller than has previously been measured for fixed tissue, but the difference was still significant and discrimination of these two tissue types is still possible. The difference between glandular and malignant tissue was close-to significant; it is reasonable to expect a significant difference with a larger set of samples. We believe that our studies have contributed to lower the overall uncertainty of breast tissue attenuation in the literature due to the relatively large sample sets, the novel measurement method, and by clarifying the difference between fresh and fixed tissue.
An abundance of laboratory-based experiments has described a vigilance decrement of reducing accuracy to detect targets with time on task, but there are few real-world studies, none of which have ...previously controlled the environment to control for bias. We describe accuracy in clinical practice for 360 experts who examined >1 million women's mammograms for signs of cancer, whilst controlling for potential biases. The vigilance decrement pattern was not observed. Instead, test accuracy improved over time, through a reduction in false alarms and an increase in speed, with no significant change in sensitivity. The multiple-decision model explains why experts miss targets in low prevalence settings through a change in decision threshold and search quit threshold and propose it should be adapted to explain these observed patterns of accuracy with time on task. What is typically thought of as standard and robust research findings in controlled laboratory settings may not directly apply to real-world environments and instead large, controlled studies in relevant environments are needed.
Management of screen-detected ductal carcinoma in situ (DCIS) remains controversial.
A prospective cohort of patients with DCIS diagnosed through the UK National Health Service Breast Screening ...Programme (1st April 2003 to 31st March 2012) was linked to national databases and case note review to analyse patterns of care, recurrence and mortality.
Screen-detected DCIS in 9938 women, with mean age of 60 years (range 46–87), was treated by mastectomy (2931) or breast conserving surgery (BCS) (7007; 70%). At 64 months median follow-up, 697 (6.8%) had further DCIS or invasive breast cancer after BCS (7.8%) or mastectomy (4.5%) (p < 0.001). Breast radiotherapy (RT) after BCS (4363/7007; 62.3%) was associated with a 3.1% absolute reduction in ipsilateral recurrent DCIS or invasive breast cancer (no RT: 7.2% versus RT: 4.1% p < 0.001) and a 1.9% absolute reduction for ipsilateral invasive breast recurrence (no RT: 3.8% versus RT: 1.9% p < 0.001), independent of the excision margin width or size of DCIS. Women without RT after BCS had more ipsilateral breast recurrences (p < 0.001) when the radial excision margin was <2 mm. Adjuvant endocrine therapy (1208/9938; 12%) was associated with a reduction in any ipsilateral recurrence, whether RT was received (hazard ratio HR 0.57; 95% confidence interval CI 0.41–0.80) or not (HR 0.68; 95% CI 0.51–0.91) after BCS. Women who developed invasive breast recurrence had a worse survival than those with recurrent DCIS (p < 0.001). Among 321 (3.2%) who died, only 46 deaths were attributed to invasive breast cancer.
Recurrent DCIS or invasive cancer is uncommon after screen-detected DCIS. Both RT and endocrine therapy were associated with a reduction in further events but not with breast cancer mortality within 5 years of diagnosis. Further research to identify biomarkers of recurrence risk, particularly as invasive disease, is indicated.
•Adjuvant radiotherapy (RT) after wide excision is associated with a reduced risk of ipsilateral recurrence but not mortality•Survival after treatment of DCIS is excellent, with few subsequent deaths from breast cancer.•Further DCIS or invasive breast cancer is not uncommon (6.8% at 5 years)•5-year mortality was not impacted by the use of RT or endocrine therapy.
The Sloane audit compares screen-detected ductal carcinoma in situ (DCIS) pathology with subsequent management and outcomes.
This was a national, prospective cohort study of DCIS diagnosed during ...2003-2012.
Among 11,337 patients, 7204 (64%) had high-grade DCIS. Over time, the proportion of high-grade disease increased (from 60 to 65%), low-grade DCIS decreased (from 10 to 6%) and mean size increased (from 21.4 to 24.1 mm). Mastectomy was more common for high-grade (36%) than for low-grade DCIS (15%). Few (6%) patients treated with breast-conserving surgery (BCS) had a surgical margin <1 mm. Of the 9191 women diagnosed in England (median follow-up 9.4 years), 7% developed DCIS or invasive malignancy in the ipsilateral and 5% in the contralateral breast. The commonest ipsilateral event was invasive carcinoma (n = 413), median time 62 months, followed by DCIS (n = 225), at median 37 months. Radiotherapy (RT) was most protective against recurrence for high-grade DCIS (3.2% for high-grade DCIS with RT compared to 6.9% without, compared with 2.3 and 3.0%, respectively, for low/intermediate-grade DCIS). Ipsilateral DCIS events lessened after 5 years, while the risk of ipsilateral invasive cancer remained consistent to beyond 10 years.
DCIS pathology informs patient management and highlights the need for prolonged follow-up of screen-detected DCIS.