•We developed a model to predict whether a lesion will be missed by a trainee.•The user model can be used to select the most challenging cases for each trainee.•Our model improved the status quo of ...case presentation to trainee in tomosynthesis.
Digital breast tomosynthesis (DBT) can improve lesion visibility in comparison to mammography by eliminating breast tissue superimposition. While the benefits of DBT in breast cancer screening rely on well trained radiologists, the optimal training regimen in DBT is unknown. We propose a computer-aided educational system that individually selects the optimal training cases for each trainee. The first step towards this goal is to capture the individual weaknesses of each trainee. In this study, we present and evaluate a computer algorithm for this purpose with particular focus on false negative errors.
We developed an algorithm (a user model) that predicted the likelihood of a trainee missing an abnormal location. An individual model is applied for each trainee. The algorithm consists of three steps. First, the lesions on DBT images are segmented by a 3D active contour method with a level set algorithm. Then, 16 features are extracted automatically for the segmented lesions. Finally a multivariate logistic regression classifier predicts the likelihood of error based on the extracted features. The classifier is trained using the previous interpretation data of the trainee. We evaluated the individual predictive algorithms experimentally using data from a reader study in which 29 trainees and 3 expert breast radiologists read 60 DBT cases. Receiver operating characteristic (ROC) analysis, along with a repeated holdout approach, was used to evaluate the predictive performance of our algorithm.
The average area under the ROC curve (AUC) of the algorithms which predicted which lesions will be detected and which will be missed by a specific trainee was 0.627 (95% CI: 0.579–0.675). The average performance was statistically significantly better than chance (p<0.001). Under the status quo, training involves no specific strategy for case presentation, and this random behavior corresponds to AUC of 0.5. Therefore, the proposed algorithm may provide a significant improvement in distinguishing abnormal locations that will be detected by a trainee from those that will be missed.
Our algorithm was able to distinguish abnormal locations that will be detected by a trainee from those that will be missed. This could be used to enrich the training set with cases that are likely to prompt error for the individual trainee while still maintaining a range of cases necessary for comprehensive education.
... Mary and I crossed paths continually on over 20 other committees, most dealing with the administration of, and access to, justice. " Years ago she told me how her parents' community service had ...inspired her to devote one-third of her work day to bar association and other community activities!" She was president of the Washington Chapter of die American Academy of Matrimonial Lawyers, and was Family Law Section Chair for the Washington State and King County Bar Associations.
Computer-aided detection (CAD) algorithms have successfully revealed breast masses and microcalcifications on screening mammography. The purpose of our study was to evaluate the sensitivity of ...commercially available CAD systems for revealing architectural distortion, the third most common appearance of breast cancer.
Two commercially available CAD systems were used to evaluate screening mammograms obtained in 43 patients with 45 mammographically detected regions of architectural distortion. For each CAD system, we determined the sensitivity for revealing architectural distortion on at least one image of the two-view mammographic examination (case sensitivity) and for each individual mammogram (image sensitivity). Surgical biopsy results were available for each case of architectural distortion.
Architectural distortion was deemed present and actionable by a panel of expert breast imagers in 80 views of the 45 cases. One CAD system detected distortion in 22 of 45 cases of distortion (case sensitivity, 49%) and in 30 of 80 mammograms (image sensitivity, 38%); it displayed 0.7 false-positive marks per image. Another CAD system identified distortion in 15 of 45 cases (case sensitivity, 33%) and 17 of 80 mammograms (image sensitivity, 21%); it displayed 1.27 false-positive marks per image. Sensitivity for malignancy-caused distortion was similar to or lower than sensitivity for all causes of distortion.
Fewer than one half of the cases of architectural distortion were detected by the two most widely available CAD systems used for interpretations of screening mammograms. Considerable improvement in the sensitivity of CAD systems is needed for detecting this type of lesion. Practicing breast imagers who use CAD systems should remain vigilant for architectural distortion.
Purpose:
Mammography is known to be one of the most difficult radiographic exams to interpret. Mammography has important limitations, including the superposition of normal tissue that can obscure a ...mass, chance alignment of normal tissue to mimic a true lesion and the inability to derive volumetric information. It has been shown that stereomammography can overcome these deficiencies by showing that layers of normal tissue lay at different depths. If standard stereomammography (i.e., a single stereoscopic pair consisting of two projection images) can significantly improve lesion detection, how will multiview stereoscopy (MVS), where many projection images are used, compare to mammography? The aim of this study was to assess the relative performance of MVS compared to mammography for breast mass detection.
Methods:
The MVS image sets consisted of the 25 raw projection images acquired over an arc of approximately 45° using a Siemens prototype breast tomosynthesis system. The mammograms were acquired using a commercial Siemens FFDM system. The raw data were taken from both of these systems for 27 cases and realistic simulated mass lesions were added to duplicates of the 27 images at the same local contrast. The images with lesions (27 mammography and 27 MVS) and the images without lesions (27 mammography and 27 MVS) were then postprocessed to provide comparable and representative image appearance across the two modalities. All 108 image sets were shown to five full-time breast imaging radiologists in random order on a state-of-the-art stereoscopic display. The observers were asked to give a confidence rating for each image (0 for lesion definitely not present, 100 for lesion definitely present). The ratings were then compiled and processed using ROC and variance analysis.
Results:
The mean AUC for the five observers was
0.614
±
0.055
for mammography and
0.778
±
0.052
for multiview stereoscopy. The difference of
0.164
±
0.065
was statistically significant with a
p
-value of 0.0148.
Conclusions:
The differences in the AUCs and the
p
-value suggest that multiview stereoscopy has a statistically significant advantage over mammography in the detection of simulated breast masses. This highlights the dominance of anatomical noise compared to quantum noise for breast mass detection. It also shows that significant lesion detection can be achieved with MVS without any of the artifacts associated with tomosynthesis.
The purpose of this article is to determine the potential reduction in screening recall rates by strictly following standardized BI-RADS lexicon for lesions seen on screening mammography.
Of 3084 ...consecutive mammograms performed at our screening facilities, 345 women with 437 lesions were recalled for additional imaging and constituted our study population. Three radiologists retrospectively classified lesions using the standard BI-RADS lexicon and assigned each to one of four groups: group A, the finding met criteria for recall by the BI-RADS lexicon; group B, the finding did not meet strict BI-RADS criteria for recall but was sufficiently indeterminate to warrant recall by the majority of the study panel; group C, the finding was classifiable by the BI-RADS lexicon but was not recalled because it was benign or stable; and group D, the questioned finding was not considered an abnormality by our study panel. Recall rates and the cancer detection rate were determined. The adjusted recall rate was calculated for lesions considered appropriate for recall (group A), and the reduction in the recall rate was determined.
Nineteen malignancies were detected in our recalled population, for a cancer detection rate of 0.65%. All 19 malignancies were lesions considered appropriate for recall (group A). If only group A lesions had been recalled, the recall rate would have decreased from 11.4% to 6.2%, representing a 46% reduction in recalls without affecting the cancer detection rate.
Using the BI-RADS lexicon as a decision-making aid may help adjust thresholds for recalling indeterminate or suspicious lesions and reduce recall rates from screening mammography.
To retrospectively compare recall and cancer detection rates between immediate and subsequent batch methods for interpretation of screening mammograms.
Institutional review board approval was ...obtained, and informed consent was waived. Retrospective analysis was performed for 8698 screening mammograms obtained between January 1 and October 31, 2001, which were interpreted either immediately (n = 4113) or subsequently with batch method (n = 4585). Data were collected from data reporting system and patient billing records. Patients with high risk factors were excluded; 3441 patients were in the immediate group, and 3932 were in the batch group. The two groups were compared with respect to age, breast density, and availability of comparison films with Wilcoxon rank sum test. Recall rates and cancer detection rates for each group were determined and compared with Pearson chi(2) test; false-negative rates were compared with Fischer exact test.
A significant difference (P < .001) was noted in recall rates between immediate (18%) and batch (14%) groups; however, no significant difference (P = .7) was noted in cancer detection rates (immediate, 0.5%; batch, 0.4%). Mean age of patients was 56.8 years (age range, 21-96 years) in the immediate group and 56.2 years (age range 24-98 years) in the batch group (P = .02). Comparison of breast densities between groups indicates no statistically significant difference (P = .4). The batch group had significantly fewer comparison mammograms (3106 79%) available than the immediate group (2856 83%) (P < .001). There was no significant difference in false-negative rates between the immediate group (0.1%) and the batch group (0.1%) (P > .99).
Immediate interpretation of screening mammograms resulted in a statistically significant increase in recalls and additional clinical work-ups of perceived abnormalities; however, no significant difference in cancer detection rate was detected between groups.
The purpose of this study was to measure the level of inter- and intraobserver agreement and to evaluate the causes of variability in radiologists' descriptions and assessments of sonograms of solid ...breast masses.
Sixty sonograms of solid masses were evaluated independently by five radiologists. Observers used the lexicon of a recently published benchmark report on sonographic appearances of breast masses to determine mass shape, margin, echogenicity, echo texture, presence of echogenic pseudocapsule, and acoustic transmission. Final diagnostic assessments were determined by applying the rule-based model of the same benchmark report to the radiologists' descriptions. In addition, one observer interpreted each case twice to evaluate intraobserver variability. Inter- and intraobserver variability were measured using Cohen's kappa statistic. We also investigated causes of variability in radiologists' descriptions.
Interobserver agreement ranged from lowest for determining the presence of an echogenic pseudocapsule (kappa = .09) to highest for determining mass shape (kappa = .8). Intraobserver agreement was lowest for mass echo texture (kappa = .24) and greatest for mass shape (kappa = .79). Variability in descriptions of lesions contributed to interobserver (kappa = .51) and some intraobserver (kappa = .66) inconsistency in assessing the likelihood of malignancy.
Lack of uniformity among observers' use of descriptive terms for solid breast masses resulted in inconsistent diagnoses. The need for improved definitions and additional illustrative examples could be addressed by developing a standardized lexicon similar to that of the Breast Imaging Reporting and Data System.
To determine effects of lesion type (calcification vs mass) and image processing on radiologist's performance for area under the receiver operating characteristic curve (AUC), sensitivity, and ...specificity for detection of masses and calcifications with digital mammography in women with mammographically dense breasts.
This study included 201 women who underwent digital mammography at seven U.S. and Canadian medical centers. Three image-processing algorithms were applied to the digital images, which were acquired with Fischer, General Electric, and Lorad digital mammography units. Eighteen readers participated in the reader study (six readers per algorithm). Baseline values for reader performance with screen-film mammograms were obtained through the additional interpretation of 179 screen-film mammograms. A repeated-measures analysis of covariance allowing unequal slopes was used in each of the nine analyses (AUC, sensitivity, and specificity for each of three machines). Bonferroni correction was used.
Although lesion type did not affect the AUC or sensitivity for Fischer digital images, it did affect specificity (P =.0004). For the General Electric digital images, AUC, sensitivity, and specificity were not affected by lesion type. For Lorad digital images, the results strongly suggested that lesion type affected AUC and sensitivity (P <.0001). None of the three image-processing methods tested affected the AUC, sensitivity, or specificity for the Fischer, General Electric, or Lorad digital images.
Findings in this study indicate that radiologist's interpretation accuracy in interpreting digital mammograms depends on lesion type. Interpretation accuracy was not influenced by the image-processing method.
To compare two display technologies, cathode ray tube (CRT) and liquid crystal display (LCD), in terms of diagnostic accuracy for several common clinical tasks in digital mammography.
Simulated ...masses and microcalcifications were inserted into normal digital mammograms to produce an image set of 400 images. Images were viewed on one CRT and one LCD medical-quality display device by five experienced breast-imaging radiologists who rated the images using a categorical rating paradigm. The observer data were analyzed to determine overall classification accuracy, overall lesion detection accuracy, and accuracy for four specific diagnostic tasks: detection of benign masses, malignant masses, and microcalcifications, and discrimination of benign and malignant masses.
Radiologists had similar overall classification accuracy (LCD: 0.83 +/- 0.01, CRT: 0.82 +/- 0.01) and lesion detection accuracy (LCD: 0.87 +/- 0.01, CRT: 0.85 +/- 0.01) on both displays. The difference in accuracy between LCD and CRT for the detection of benign masses, malignant masses, and microcalcifications, and discrimination of benign and malignant masses was -0.019 +/- 0.009, 0.020 +/- 0.008, 0.012 +/- 0.013, and 0.0094 +/- 0.011, respectively. Overall, the two displays did not exhibit any statistically significant difference (P > .05).
This study explored the suitability of two different soft-copy displays for the viewing of mammographic images. It found that LCD and CRT displays offer similar clinical utility for mammographic tasks.
To compare the diagnostic accuracy of the Fischer Senoscan Digital Mammography System with that of standard screen-film mammography in a population of women presenting for screening or diagnostic ...mammography.
Enrollment of patients took place at six different breast-imaging centers between 1997 and 1999. A total of 247 cases were selected for inclusion in the final reader study. All known cancer cases were included (111) from all six participating sites representing 45% of the total cases. The remaining 136 cases (55%) were randomly selected from all available benign or negative cases from three of the six sites. A complete case consisted of both a (unilateral or bilateral) digital and screen-film mammogram of the same patient. Eight radiologists interpreted the cases in laser-printed digital and screen-film hardcopy formats. The study was designed to detect differences of 0.05 in the ROC area under the curve (AUC) between digital and screen-film radiologist interpretation performance.
The average AUC for the Senoscan digital was 0.715 for the 8 readers. The average AUC for screen-film was 0.765. The difference AUC of -0.05 falls within the 95% confidence interval (-0.101, 0.002). The average sensitivity was 66% and specificity 67% for SenoScan full-field digital mammography. The average screen-film mammography sensitivity and specificity were 74% and 60%, respectively.
No statistically significant difference in diagnostic accuracy between the Fischer Senoscan and screen-film mammography was detected in this study.