•A comprehensive review of state-of-the-art deep learning (DL) approaches is presented in the context of histopathological image analysis.•This survey paper focuses on a methodological aspect of ...different machine learning strategies such as supervised, weakly supervised, unsupervised, transfer learning and various other sub-variants of these methods.•We also provided an overview of deep learning based survival models that are applicable for diseasespecific prognosis tasks.•Finally, we summarize several existing open datasets and highlight critical challenges and limitations with current deep learning approaches, along with possible avenues for future research.
Display omitted
Histopathological images contain rich phenotypic information that can be used to monitor underlying mechanisms contributing to disease progression and patient survival outcomes. Recently, deep learning has become the mainstream methodological choice for analyzing and interpreting histology images. In this paper, we present a comprehensive review of state-of-the-art deep learning approaches that have been used in the context of histopathological image analysis. From the survey of over 130 papers, we review the field’s progress based on the methodological aspect of different machine learning strategies such as supervised, weakly supervised, unsupervised, transfer learning and various other sub-variants of these methods. We also provide an overview of deep learning based survival models that are applicable for disease-specific prognosis tasks. Finally, we summarize several existing open datasets and highlight critical challenges and limitations with current deep learning approaches, along with possible avenues for future research.
Highlights•We present the first comprehensive review and comparison of the existing plug-and-play segmentation loss functions in an organized manner.•We conduct the largest experiments for 20 loss ...functions on four segmentation tasks with six public datasets from 10+ medical centers, and highlight the most robust loss functions.•The code is publicly available at https://github.com/JunMa11/SegLoss.
Display omitted
The loss function is an important component in deep learning-based segmentation methods. Over the past five years, many loss functions have been proposed for various segmentation tasks. However, a systematic study of the utility of these loss functions is missing. In this paper, we present a comprehensive review of segmentation loss functions in an organized manner. We also conduct the first large-scale analysis of 20 general loss functions on four typical 3D segmentation tasks involving six public datasets from 10+ medical centers. The results show that none of the losses can consistently achieve the best performance on the four segmentation tasks, but compound loss functions (e.g. Dice with TopK loss, focal loss, Hausdorff distance loss, and boundary loss) are the most robust losses. Our code and segmentation results are publicly available and can serve as a loss function benchmark. We hope this work will also provide insights on new loss function development for the community.
Completely labeled pathology datasets are often challenging and time-consuming to obtain. Semi-supervised learning (SSL) methods are able to learn from fewer labeled data points with the help of a ...large number of unlabeled data points. In this paper, we investigated the possibility of using clustering analysis to identify the underlying structure of the data space for SSL. A cluster-then-label method was proposed to identify high-density regions in the data space which were then used to help a supervised SVM in finding the decision boundary. We have compared our method with other supervised and semi-supervised state-of-the-art techniques using two different classification tasks applied to breast pathology datasets. We found that compared with other state-of-the-art supervised and semi-supervised methods, our SSL method is able to improve classification performance when a limited number of labeled data instances are made available. We also showed that it is important to examine the underlying distribution of the data space before applying SSL techniques to ensure semi-supervised learning assumptions are not violated by the data.
•A network to perform segmentation with limited data by leveraging coarse image-level labels is presented.•Experiments verify it is possible to train a segmentation network with a single ...segmentation-level labeled image (per class).•A novel ground truth extraction method to address class imbalance problem observed in whole slide images in digital pathology.
Display omitted
Two of the most common tasks in medical imaging are classification and segmentation. Either task requires labeled data annotated by experts, which is scarce and expensive to collect. Annotating data for segmentation is generally considered to be more laborious as the annotator has to draw around the boundaries of regions of interest, as opposed to assigning image patches a class label. Furthermore, in tasks such as breast cancer histopathology, any realistic clinical application often includes working with whole slide images, whereas most publicly available training data are in the form of image patches, which are given a class label. We propose an architecture that can alleviate the requirements for segmentation-level ground truth by making use of image-level labels to reduce the amount of time spent on data curation. In addition, this architecture can help unlock the potential of previously acquired image-level datasets on segmentation tasks by annotating a small number of regions of interest. In our experiments, we show using only one segmentation-level annotation per class, we can achieve performance comparable to a fully annotated dataset.
To determine suitable features and optimal classifier design for a computer-aided diagnosis (CAD) system to differentiate among mass and nonmass enhancements during dynamic contrast material-enhanced ...magnetic resonance (MR) imaging of the breast.
Two hundred eighty histologically proved mass lesions and 129 histologically proved nonmass lesions from MR imaging studies were retrospectively collected. The institutional research ethics board approved this study and waived informed consent. Breast Imaging Reporting and Data System classification of mass and nonmass enhancement was obtained from radiologic reports. Image data from dynamic contrast-enhanced MR imaging were extracted and analyzed by using feature selection techniques and binary, multiclass, and cascade classifiers. Performance was assessed by measuring the area under the receiver operating characteristics curve (AUC), sensitivity, and specificity. Bootstrap cross validation was used to predict the best classifier for the classification task of mass and nonmass benign and malignant breast lesions.
A total of 176 features were extracted. Feature relevance ranking indicated unequal importance of kinetic, texture, and morphologic features for mass and nonmass lesions. The best classifier performance was a two-stage cascade classifier (mass vs nonmass followed by malignant vs benign classification) (AUC, 0.91; 95% confidence interval (CI): 0.88, 0.94) compared with one-shot classifier (ie, all benign vs malignant classification) (AUC, 0.89; 95% CI: 0.85, 0.92). The AUC was 2% higher for cascade (median percent difference obtained by using paired bootstrapped samples) and was significant (P = .0027). Our proposed two-stage cascade classifier decreases the overall misclassification rate by 12%, with 72 of 409 missed diagnoses with cascade versus 82 of 409 missed diagnoses with one-shot classifier.
Separately optimizing feature selection and training classifiers for mass and nonmass lesions improves the accuracy of CAD for breast MR imaging. By cascading classifiers, we achieved a significant improvement in performance with respect to the use of a one-shot classifier. Our cascaded classifier may provide an advantage for screening women at high risk for breast cancer, in whom the ability to diagnose cancers at an early stage is of primary importance.
•We design a self-supervised pretext task via predicting the resolution sequence ordering in histology WSI.•We propose a teacher-student consistency paradigm to effectively transfer the pretrained ...representations to downstream tasks.•Extensive validation experiments on three histopathology benchmark datasets for classification and regression based tasks.•Proposed method yields tangible improvements outperforming other state-of-the-art self-supervised and supervised baselines.
Display omitted
Training a neural network with a large labeled dataset is still a dominant paradigm in computational histopathology. However, obtaining such exhaustive manual annotations is often expensive, laborious, and prone to inter and intra-observer variability. While recent self-supervised and semi-supervised methods can alleviate this need by learning unsupervised feature representations, they still struggle to generalize well to downstream tasks when the number of labeled instances is small. In this work, we overcome this challenge by leveraging both task-agnostic and task-specific unlabeled data based on two novel strategies: (i) a self-supervised pretext task that harnesses the underlying multi-resolution contextual cues in histology whole-slide images to learn a powerful supervisory signal for unsupervised representation learning; (ii) a new teacher-student semi-supervised consistency paradigm that learns to effectively transfer the pretrained representations to downstream tasks based on prediction consistency with the task-specific unlabeled data.
We carry out extensive validation experiments on three histopathology benchmark datasets across two classification and one regression based tasks, i.e., tumor metastasis detection, tissue type classification, and tumor cellularity quantification. Under limited-label data, the proposed method yields tangible improvements, which is close to or even outperforming other state-of-the-art self-supervised and supervised baselines. Furthermore, we empirically show that the idea of bootstrapping the self-supervised pretrained features is an effective way to improve the task-specific semi-supervised learning on standard benchmarks. Code and pretrained models are made available at: https://github.com/srinidhiPY/SSL_CR_Histo.
Computer-aided diagnosis (CAD) has been proposed for breast MRI as a tool to standardize evaluation, to automate time-consuming analysis, and to aid the diagnostic decision process by radiologists. ...T2w MRI findings are diagnostically complementary to T1w DCE-MRI findings in the breast and prior research showed that measuring the T2w intensity of a lesion relative to a tissue of reference improves diagnostic accuracy. The diagnostic value of this information in CAD has not been yet quantified. This study proposes an automatic method of assessing relative T2w lesion intensity without the need to select a reference region. We also evaluate the effect of adding this feature to other T2w and T1w image features in the predictive performance of a breast lesion classifier for differential diagnosis of benign and malignant lesions. An automated feature of relative T2w lesion intensity was developed using a quantitative regression model. The diagnostic performance of the proposed feature in addition to T2w texture was compared to the performance of a conventional breast MRI CAD system based on T1w DCE-MRI features alone. The added contribution of T2w features to more conventional T1w-based features was investigated using classification rules extracted from the lesion classifier. After institutional review board approval that waived informed consent, we identified 627 breast lesions (245 malignant, 382 benign) diagnosed after undergoing breast MRI at our institution between 2007 and 2014. An increase in diagnostic performance in terms of area under the curve (AUC) from the receiver operating characteristic (ROC) analysis was observed with the additional T2w features and the proposed quantitative feature of relative T2w lesion intensity. AUC increased from 0.80 to 0.83 and this difference was statistically significant (adjusted p-value = 0.020).
The required training sample size for a particular machine learning (ML) model applied to medical imaging data is often unknown. The purpose of this study was to provide a descriptive review of ...current sample-size determination methodologies in ML applied to medical imaging and to propose recommendations for future work in the field.
We conducted a systematic literature search of articles using Medline and Embase with keywords including “machine learning,” “image,” and “sample size.” The search included articles published between 1946 and 2018. Data regarding the ML task, sample size, and train-test pipeline were collected.
A total of 167 articles were identified, of which 22 were included for qualitative analysis. There were only 4 studies that discussed sample-size determination methodologies, and 18 that tested the effect of sample size on model performance as part of an exploratory analysis. The observed methods could be categorized as pre hoc model-based approaches, which relied on features of the algorithm, or post hoc curve-fitting approaches requiring empirical testing to model and extrapolate algorithm performance as a function of sample size. Between studies, we observed great variability in performance testing procedures used for curve-fitting, model assessment methods, and reporting of confidence in sample sizes.
Our study highlights the scarcity of research in training set size determination methodologies applied to ML in medical imaging, emphasizes the need to standardize current reporting practices, and guides future work in development and streamlining of pre hoc and post hoc sample size approaches.
On ignore souvent la taille de l’échantillon d'apprentissage nécessaire pour un modèle d'apprentissage artificiel en particulier, appliqué à des données d'imagerie médicale. L'objectif de cette étude était d’établir une synthèse descriptive des méthodologies actuelles visant à déterminer les tailles d’échantillon en apprentissage artificiel appliqué à l'imagerie médicale et de proposer des recommandations pour la réalisation des futurs travaux dans ce domaine.
Nous avons effectué une recherche systématique de documentation scientifique des articles disponibles dans les bases de données Medline et Embase, en utilisant notamment les mots clés suivants : « apprentissage artificiel », « image » et « taille d’échantillon ». Cette recherche portait sur des articles publiés entre 1946 et 2018. Les données associées aux activités d'apprentissage artificiel, aux tailles d’échantillon et aux systèmes de test-apprentissage ont été recueillies.
Au total, 167 articles ont été identifiés, dont 22 ont été sélectionnés pour faire l'objet d'une analyse qualitative. Seulement 4 études abordaient les méthodologies de détermination de taille d’échantillon et 18 évaluaient l'impact de la taille d’échantillon sur l'efficacité du modèle, au sein d'une analyse exploratoire. Les méthodes observées pouvaient être classées en deux catégories: les méthodes basées sur un modèle a priori (pre hoc) fondées sur les propriétés de l'algorithme et les méthodes d'ajustement de courbe a posteriori (post hoc) nécessitant des analyses empiriques du modèle et l'extrapolation des performances de l'algorithme en tant que fonction de la taille d’échantillon. Nous avons observé une forte variabilité entre les études au niveau de l'efficacité des procédures d'analyse utilisées pour les méthodes d’évaluation des modèles d'ajustement de courbe et de la confiance relative à la taille de l’échantillon.
Notre étude met en évidence la rareté des études comportant des méthodologies de détermination de taille d’échantillon pour l'apprentissage artificiel appliqué à l'imagerie médicale. Elle souligne le besoin de standardiser les pratiques actuelles de communication de données et préconise les travaux ultérieurs à réaliser au niveau de la mise au point et de la simplification des démarches a priori et a posteriori de détermination de la taille d’échantillon.
•Nonmass-like lesions can be described as clusters of spatially and tempo- rally inter-connected regions of enhancements in breast MRI, so they can be modeled as networks and their properties ...characterized via network- based connectivity.•Proposed framework optimizes an embedded feature representation of lower dimensionality that maximizes the accuracy of a computer aided diagnostic lesion classifier.•A joint optimization of objective functions for improved deep embedded unsupervised clustering (DEC) and supervised multi-layered perceptron (MLP) classification of nonmass benign and malignant lesions.•Best performance achieved during cross-validation was AUC = 0.81 ± 0.10 and best generalization performance achived in an independent held-out test set was AUC = 0.78.•Potential impact for the discovery of features associated with a significant reduction in the malignant likelihood of nonmass-like enhancement in breast MRI.
Nonmass-like enhancements are a common but diagnostically challenging finding in breast MRI. Nonmass-like lesions can be described as clusters of spatially and temporally inter-connected regions of enhancements, so they can be modeled as networks and their properties characterized via network-based connectivity. In this work, we represented nonmass lesions as graphs using a link formation energy model that favors linkages between regions of similar enhancement and closer spatial proximity. However, adding graph features to an existing computer-aided diagnosis (CAD) pipeline incurs an increase of feature space dimensionality, which poses additional challenges to traditional supervised machine learning techniques due to the inability to increase accordingly the number of training datasets. We propose the combination of unsupervised dimensionality reduction and embedded space clustering followed by a supervised classifier to improve the performance of a CAD system for nonmass-like lesions in breast MRI. Our work extends a previoulsy proposed framework for deep embedded unsupervised clustering (DEC) to embedding space classification, with the joint optimization of objective functions for DEC and supervised multi-layered perceptron (MLP) classification. The strength of the method lies in the ability to learn and further optimize an embedded feature representation of lower dimensionality that maximizes the diagnostic accuracy of a CAD lesion classifier to discriminate between benign and malignant lesions. We identified 792 nonmass-like enhancements (267 benign, 110 malignant and 415 unknown) in 411 patients undergoing breast MRI at our institution. The diagnostic performance of the proposed method was evaluated and compared to the performance of a conventional supervised MLP classifier in original feature space. A statistically significant increase in diagnostic area under the ROC curve (AUC) was achieved. Generalization AUC increased from 0.67 ± 0.08 to 0.81 ± 0.10 (21% increase, p-value=4.2×10−8) with the proposed graph-based lesion characterization and deep embedding framework.
Abstract
Background
Local response prediction for brain metastases (BM) after stereotactic radiosurgery (SRS) is challenging, particularly for smaller BM, as existing criteria are based solely on ...unidimensional measurements. This investigation sought to determine whether radiomic features provide additional value to routinely available clinical and dosimetric variables to predict local recurrence following SRS.
Methods
Analyzed were 408 BM in 87 patients treated with SRS. A total of 440 radiomic features were extracted from the tumor core and the peritumoral regions, using the baseline pretreatment volumetric post-contrast T1 (T1c) and volumetric T2 fluid-attenuated inversion recovery (FLAIR) MRI sequences. Local tumor progression was determined based on Response Assessment in Neuro-Oncology‒BM criteria, with a maximum axial diameter growth of >20% on the follow-up T1c indicating local failure. The top radiomic features were determined based on resampled random forest (RF) feature importance. An RF classifier was trained using each set of features and evaluated using the area under the receiver operating characteristic curve (AUC).
Results
The addition of any one of the top 10 radiomic features to the set of clinical features resulted in a statistically significant (P < 0.001) increase in the AUC. An optimized combination of radiomic and clinical features resulted in a 19% higher resampled AUC (mean = 0.793; 95% CI = 0.792–0.795) than clinical features alone (0.669, 0.668–0.671).
Conclusions
The increase in AUC of the RF classifier, after incorporating radiomic features, suggests that quantitative characterization of tumor appearance on pretreatment T1c and FLAIR adds value to known clinical and dosimetric variables for predicting local failure.