A survey on deep learning in medical image analysis Litjens, Geert; Kooi, Thijs; Bejnordi, Babak Ehteshami ...
Medical image analysis,
December 2017, 2017-Dec, 2017-12-00, 20171201, Volume:
42
Journal Article
Peer reviewed
Open access
•A summary of all deep learning algorithms used in medical image analysis is given.•The most successful algorithms for key image analysis tasks are identified.•300 papers applying deep learning to ...different applications have been summarized.
Display omitted
Deep learning algorithms, in particular convolutional networks, have rapidly become a methodology of choice for analyzing medical images. This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year. We survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks. Concise overviews are provided of studies per application area: neuro, retinal, pulmonary, digital pathology, breast, cardiac, abdominal, musculoskeletal. We end with a summary of the current state-of-the-art, a critical discussion of open challenges and directions for future research.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP
Machine learning techniques have great potential to improve medical diagnostics, offering ways to improve accuracy, reproducibility and speed, and to ease workloads for clinicians. In the field of ...histopathology, deep learning algorithms have been developed that perform similarly to trained pathologists for tasks such as tumor detection and grading. However, despite these promising results, very few algorithms have reached clinical implementation, challenging the balance between hope and hype for these new techniques. This Review provides an overview of the current state of the field, as well as describing the challenges that still need to be addressed before artificial intelligence in histopathology can achieve clinical value.
Full text
Available for:
GEOZS, IJS, IMTLJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK, ZAGLJ
Variations in the color and intensity of hematoxylin and eosin (H&E) stained histological slides can potentially hamper the effectiveness of quantitative image analysis. This paper presents a fully ...automated algorithm for standardization of whole-slide histopathological images to reduce the effect of these variations. The proposed algorithm, called whole-slide image color standardizer (WSICS), utilizes color and spatial information to classify the image pixels into different stain components. The chromatic and density distributions for each of the stain components in the hue-saturation-density color model are aligned to match the corresponding distributions from a template whole-slide image (WSI). The performance of the WSICS algorithm was evaluated on two datasets. The first originated from 125 H&E stained WSIs of lymph nodes, sampled from 3 patients, and stained in 5 different laboratories on different days of the week. The second comprised 30 H&E stained WSIs of rat liver sections. The result of qualitative and quantitative evaluations using the first dataset demonstrate that the WSICS algorithm outperforms competing methods in terms of achieving color constancy. The WSICS algorithm consistently yields the smallest standard deviation and coefficient of variation of the normalized median intensity measure. Using the second dataset, we evaluated the impact of our algorithm on the performance of an already published necrosis quantification system. The performance of this system was significantly improved by utilizing the WSICS algorithm. The results of the empirical evaluations collectively demonstrate the potential contribution of the proposed standardization algorithm to improved diagnostic accuracy and consistency in computer-aided diagnosis for histopathology data.
The breast stromal microenvironment is a pivotal factor in breast cancer development, growth and metastases. Although pathologists often detect morphologic changes in stroma by light microscopy, ...visual classification of such changes is subjective and non-quantitative, limiting its diagnostic utility. To gain insights into stromal changes associated with breast cancer, we applied automated machine learning techniques to digital images of 2387 hematoxylin and eosin stained tissue sections of benign and malignant image-guided breast biopsies performed to investigate mammographic abnormalities among 882 patients, ages 40–65 years, that were enrolled in the Breast Radiology Evaluation and Study of Tissues (BREAST) Stamp Project. Using deep convolutional neural networks, we trained an algorithm to discriminate between stroma surrounding invasive cancer and stroma from benign biopsies. In test sets (928 whole-slide images from 330 patients), this algorithm could distinguish biopsies diagnosed as invasive cancer from benign biopsies solely based on the stromal characteristics (area under the receiver operator characteristics curve = 0.962). Furthermore, without being trained specifically using ductal carcinoma in situ as an outcome, the algorithm detected tumor-associated stroma in greater amounts and at larger distances from grade 3 versus grade 1 ductal carcinoma in situ. Collectively, these results suggest that algorithms based on deep convolutional neural networks that evaluate only stroma may prove useful to classify breast biopsies and aid in understanding and evaluating the biology of breast lesions.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
The development of deep neural networks is facilitating more advanced digital analysis of histopathologic images. We trained a convolutional neural network for multiclass segmentation of digitized ...kidney tissue sections stained with periodic acid-Schiff (PAS).
We trained the network using multiclass annotations from 40 whole-slide images of stained kidney transplant biopsies and applied it to four independent data sets. We assessed multiclass segmentation performance by calculating Dice coefficients for ten tissue classes on ten transplant biopsies from the Radboud University Medical Center in Nijmegen, The Netherlands, and on ten transplant biopsies from an external center for validation. We also fully segmented 15 nephrectomy samples and calculated the network's glomerular detection rates and compared network-based measures with visually scored histologic components (Banff classification) in 82 kidney transplant biopsies.
The weighted mean Dice coefficients of all classes were 0.80 and 0.84 in ten kidney transplant biopsies from the Radboud center and the external center, respectively. The best segmented class was "glomeruli" in both data sets (Dice coefficients, 0.95 and 0.94, respectively), followed by "tubuli combined" and "interstitium." The network detected 92.7% of all glomeruli in nephrectomy samples, with 10.4% false positives. In whole transplant biopsies, the mean intraclass correlation coefficient for glomerular counting performed by pathologists versus the network was 0.94. We found significant correlations between visually scored histologic components and network-based measures.
This study presents the first convolutional neural network for multiclass segmentation of PAS-stained nephrectomy samples and transplant biopsies. Our network may have utility for quantitative studies involving kidney histopathology across centers and provide opportunities for deep learning applications in routine diagnostics.
As part of routine histological grading, for every invasive breast cancer the mitotic count is assessed by counting mitoses in the (visually selected) region with the highest proliferative activity. ...Because this procedure is prone to subjectivity, the present study compares visual mitotic counting with deep learning based automated mitotic counting and fully automated hotspot selection. Two cohorts were used in this study. Cohort A comprised 90 prospectively included tumors which were selected based on the mitotic frequency scores given during routine glass slide diagnostics. This pathologist additionally assessed the mitotic count in these tumors in whole slide images (WSI) within a preselected hotspot. A second observer performed the same procedures on this cohort. The preselected hotspot was generated by a convolutional neural network (CNN) trained to detect all mitotic figures in digitized hematoxylin and eosin (H&E) sections. The second cohort comprised a multicenter, retrospective TNBC cohort (n = 298), of which the mitotic count was assessed by three independent observers on glass slides. The same CNN was applied on this cohort and the absolute number of mitotic figures in the hotspot was compared to the averaged mitotic count of the observers. Baseline interobserver agreement for glass slide assessment in cohort A was good (kappa 0.689; 95% CI 0.580–0.799). Using the CNN generated hotspot in WSI, the agreement score increased to 0.814 (95% CI 0.719–0.909). Automated counting by the CNN in comparison with observers counting in the predefined hotspot region yielded an average kappa of 0.724. We conclude that manual mitotic counting is not affected by assessment modality (glass slides, WSI) and that counting mitotic figures in WSI is feasible. Using a predefined hotspot area considerably improves reproducibility. Also, fully automated assessment of mitotic score appears to be feasible without introducing additional bias or variability.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
This paper presents and evaluates a fully automatic method for detection of ductal carcinoma in situ (DCIS) in digitized hematoxylin and eosin (H&E) stained histopathological slides of breast tissue. ...The proposed method applies multi-scale superpixel classification to detect epithelial regions in whole-slide images (WSIs). Subsequently, spatial clustering is utilized to delineate regions representing meaningful structures within the tissue such as ducts and lobules. A region-based classifier employing a large set of features including statistical and structural texture features and architectural features is then trained to discriminate between DCIS and benign/normal structures. The system is evaluated on two datasets containing a total of 205 WSIs of breast tissue. Evaluation was conducted both on the slide and the lesion level using FROC analysis. The results show that to detect at least one true positive in every DCIS containing slide, the system finds 2.6 false positives per WSI. The results of the per-lesion evaluation show that it is possible to detect 80% and 83% of the DCIS lesions in an abnormal slide, at an average of 2.0 and 3.0 false positives per WSI, respectively. Collectively, the result of the experiments demonstrate the efficacy and accuracy of the proposed method as well as its potential for application in routine pathological diagnostics. To the best of our knowledge, this is the first DCIS detection algorithm working fully automatically on WSIs.
Delayed graft function (DGF) is a strong risk factor for development of interstitial fibrosis and tubular atrophy (IFTA) in kidney transplants. Quantitative assessment of inflammatory infiltrates in ...kidney biopsies of DGF patients can reveal predictive markers for IFTA development. In this study, we combined multiplex tyramide signal amplification (mTSA) and convolutional neural networks (CNNs) to assess the inflammatory microenvironment in kidney biopsies of DGF patients (n = 22) taken at 6 weeks post-transplantation. Patients were stratified for IFTA development (<10% versus ≥10%) from 6 weeks to 6 months post-transplantation, based on histopathological assessment by three kidney pathologists. One mTSA panel was developed for visualization of capillaries, T- and B-lymphocytes and macrophages and a second mTSA panel for T-helper cell and macrophage subsets. The slides were multi spectrally imaged and custom-made python scripts enabled conversion to artificial brightfield whole-slide images (WSI). We used an existing CNN for the detection of lymphocytes with cytoplasmatic staining patterns in immunohistochemistry and developed two new CNNs for the detection of macrophages and nuclear-stained lymphocytes. F1-scores were 0.77 (nuclear-stained lymphocytes), 0.81 (cytoplasmatic-stained lymphocytes), and 0.82 (macrophages) on a test set of artificial brightfield WSI. The CNNs were used to detect inflammatory cells, after which we assessed the peritubular capillary extent, cell density, cell ratios, and cell distance in the two patient groups. In this cohort, distance of macrophages to other immune cells and peritubular capillary extent did not vary significantly at 6 weeks post-transplantation between patient groups. CD163+ cell density was higher in patients with ≥10% IFTA development 6 months post-transplantation (p < 0.05). CD3+CD8−/CD3+CD8+ ratios were higher in patients with <10% IFTA development (p < 0.05). We observed a high correlation between CD163+ and CD4+GATA3+ cell density (R = 0.74, p < 0.001). Our study demonstrates that CNNs can be used to leverage reliable, quantitative results from mTSA-stained, multi spectrally imaged slides of kidney transplant biopsies.
This study describes a methodology to assess the microenvironment in sparse tissue samples. Deep learning, multiplex immunohistochemistry, and mathematical image processing techniques were incorporated to quantify lymphocytes, macrophages, and capillaries in kidney transplant biopsies of delayed graft function patients. The quantitative results were used to assess correlations with development of interstitial fibrosis and tubular atrophy.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Abstract
Computational pathology (CPath) algorithms detect, segment or classify cancer in whole slide images, approaching or even exceeding the accuracy of pathologists. Challenges have to be ...overcome before these algorithms can be used in practice. We therefore aim to explore international perspectives on the future role of CPath in oncological pathology by focusing on opinions and first experiences regarding barriers and facilitators. We conducted an international explorative eSurvey and semi-structured interviews with pathologists utilizing an implementation framework to classify potential influencing factors. The eSurvey results showed remarkable variation in opinions regarding attitude, understandability and validation of CPath. Interview results showed that barriers focused on the quality of available evidence, while most facilitators concerned strengths of CPath. A lack of consensus was present for multiple factors, such as the determination of sufficient validation using CPath, the preferred function of CPath within the digital workflow and the timing of CPath introduction in pathology education. The diversity in opinions illustrates variety in influencing factors in CPath adoption. A next step would be to quantitatively determine important factors for adoption and initiate validation studies. Both should include clear case descriptions and be conducted among a more homogenous panel of pathologists based on sub specialization.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ