The diagnosis of most cancers is made by a board-certified pathologist based on a tissue biopsy under the microscope. Recent research reveals a high discordance between individual pathologists. For ...melanoma, the literature reports on 25–26% of discordance for classifying a benign nevus versus malignant melanoma. A recent study indicated the potential of deep learning to lower these discordances. However, the performance of deep learning in classifying histopathologic melanoma images was never compared directly to human experts. The aim of this study is to perform such a first direct comparison.
A total of 695 lesions were classified by an expert histopathologist in accordance with current guidelines (350 nevi/345 melanoma). Only the haematoxylin & eosin (H&E) slides of these lesions were digitalised via a slide scanner and then randomly cropped. A total of 595 of the resulting images were used to train a convolutional neural network (CNN). The additional 100 H&E image sections were used to test the results of the CNN in comparison to 11 histopathologists. Three combined McNemar tests comparing the results of the CNNs test runs in terms of sensitivity, specificity and accuracy were predefined to test for significance (p < 0.05).
The CNN achieved a mean sensitivity/specificity/accuracy of 76%/60%/68% over 11 test runs. In comparison, the 11 pathologists achieved a mean sensitivity/specificity/accuracy of 51.8%/66.5%/59.2%. Thus, the CNN was significantly (p = 0.016) superior in classifying the cropped images.
With limited image information available, a CNN was able to outperform 11 histopathologists in the classification of histopathological melanoma images and thus shows promise to assist human melanoma diagnoses.
•A convolutional neural network (CNN) was trained with 595 histopathologic images of melanoma and nevi.•In a direct comparison, the CNN and 11 histopathologists classified a test set of 100 additional histopathologic images (1:1 melanoma/nevi).•The CNN systematically outperformed the 11 histopathologists in terms of overall accuracy, sensitivity and specificity (p = 0.016).
Several recent publications have demonstrated the use of convolutional neural networks to classify images of melanoma at par with board-certified dermatologists. However, the non-availability of a ...public human benchmark restricts the comparability of the performance of these algorithms and thereby the technical progress in this field.
An electronic questionnaire was sent to dermatologists at 12 German university hospitals. Each questionnaire comprised 100 dermoscopic and 100 clinical images (80 nevi images and 20 biopsy-verified melanoma images, each), all open-source. The questionnaire recorded factors such as the years of experience in dermatology, performed skin checks, age, sex and the rank within the university hospital or the status as resident physician. For each image, the dermatologists were asked to provide a management decision (treat/biopsy lesion or reassure the patient). Main outcome measures were sensitivity, specificity and the receiver operating characteristics (ROC).
Total 157 dermatologists assessed all 100 dermoscopic images with an overall sensitivity of 74.1%, specificity of 60.0% and an ROC of 0.67 (range = 0.538–0.769); 145 dermatologists assessed all 100 clinical images with an overall sensitivity of 89.4%, specificity of 64.4% and an ROC of 0.769 (range = 0.613–0.9). Results between test-sets were significantly different (P < 0.05) confirming the need for a standardised benchmark.
We present the first public melanoma classification benchmark for both non-dermoscopic and dermoscopic images for comparing artificial intelligence algorithms with diagnostic performance of 145 or 157 dermatologists. Melanoma Classification Benchmark should be considered as a reference standard for white-skinned Western populations in the field of binary algorithmic melanoma classification.
•This paper provides the first open access melanoma classification benchmark for both non-dermoscopic and dermoscopic images.•Algorithms can now be easily compared to the performance of dermatologists in terms of sensitivity, specificity and ROC.•The melanoma benchmark allows comparability between algorithms of different publications and provides a new reference standard.
Over the past decade, the development of molecular high-throughput methods (omics) increased rapidly and provided new insights for cancer research. In parallel, deep learning approaches revealed the ...enormous potential for medical image analysis, especially in digital pathology. Combining image and omics data with deep learning tools may enable the discovery of new cancer biomarkers and a more precise prediction of patient prognosis. This systematic review addresses different multimodal fusion methods of convolutional neural network-based image analyses with omics data, focussing on the impact of data combination on the classification performance.
PubMed was screened for peer-reviewed articles published in English between January 2015 and June 2021 by two independent researchers. Search terms related to deep learning, digital pathology, omics, and multimodal fusion were combined.
We identified a total of 11 studies meeting the inclusion criteria, namely studies that used convolutional neural networks for haematoxylin and eosin image analysis of patients with cancer in combination with integrated omics data. Publications were categorised according to their endpoints: 7 studies focused on survival analysis and 4 studies on prediction of cancer subtypes, malignancy or microsatellite instability with spatial analysis.
Image-based classifiers already show high performances in prognostic and predictive cancer diagnostics. The integration of omics data led to improved performance in all studies described here. However, these are very early studies that still require external validation to demonstrate their generalisability and robustness. Further and more comprehensive studies with larger sample sizes are needed to evaluate performance and determine clinical benefits.
•Fusion of image and genomic data with deep learning improves performance.•Several studies showed a benefit for cancer research.•The studies are at a very early stage and require external validation.•Further studies are needed to demonstrate clinical benefits.
In recent months, multiple publications have demonstrated the use of convolutional neural networks (CNN) to classify images of skin cancer as precisely as dermatologists. However, these CNNs failed ...to outperform the International Symposium on Biomedical Imaging (ISBI) 2016 challenge which ranked the average precision for classification of dermoscopic melanoma images. Accordingly, the technical progress represented by these studies is limited. In addition, the available reports are impossible to reproduce, due to incomplete descriptions of training procedures and the use of proprietary image databases or non-disclosure of used images. These factors prevent the comparison of various CNN classifiers in equal terms.
To demonstrate the training of an image-classifier CNN that outperforms the winner of the ISBI 2016 CNNs challenge by using open source images exclusively.
A detailed description of the training procedure is reported while the used images and test sets are disclosed fully, to insure the reproducibility of our work.
Our CNN classifier outperforms all recent attempts to classify the original ISBI 2016 challenge test data (full set of 379 test images), with an average precision of 0.709 (vs. 0.637 of the ISBI winner) and with an area under the receiver operating curve of 0.85.
This work illustrates the potential for improving skin cancer classification with enhanced training procedures for CNNs, while avoiding the use of costly equipment or proprietary image data.
Background:
Artificial intelligence (AI) has shown promise in numerous experimental studies, particularly in skin cancer diagnostics. Translation of these findings into the clinic is the logical next ...step. This translation can only be successful if patients' concerns and questions are addressed suitably. We therefore conducted a survey to evaluate the patients' view of artificial intelligence in melanoma diagnostics in Germany, with a particular focus on patients with a history of melanoma.
Participants and Methods:
A web-based questionnaire was designed using LimeSurvey, sent by e-mail to university hospitals and melanoma support groups and advertised on social media. The anonymous questionnaire evaluated patients' expectations and concerns toward artificial intelligence in general as well as their attitudes toward different application scenarios. Descriptive analysis was performed with expression of categorical variables as percentages and 95% confidence intervals. Statistical tests were performed to investigate associations between sociodemographic data and selected items of the questionnaire.
Results:
298 individuals (154 with a melanoma diagnosis, 143 without) responded to the questionnaire. About 94% 95% CI = 0.91–0.97 of respondents supported the use of artificial intelligence in medical approaches. 88% 95% CI = 0.85–0.92 would even make their own health data anonymously available for the further development of AI-based applications in medicine. Only 41% 95% CI = 0.35–0.46 of respondents were amenable to the use of artificial intelligence as stand-alone system, 94% 95% CI = 0.92–0.97 to its use as assistance system for physicians. In sub-group analyses, only minor differences were detectable. Respondents with a previous history of melanoma were more amenable to the use of AI applications for early detection even at home. They would prefer an application scenario where physician and AI classify the lesions independently. With respect to AI-based applications in medicine, patients were concerned about insufficient data protection, impersonality and susceptibility to errors, but expected faster, more precise and unbiased diagnostics, less diagnostic errors and support for physicians.
Conclusions:
The vast majority of participants exhibited a positive attitude toward the use of artificial intelligence in melanoma diagnostics, especially as an assistance system.
Background
Recent years have been witnessing a substantial improvement in the accuracy of skin cancer classification using convolutional neural networks (CNNs). CNNs perform on par with or better ...than dermatologists with respect to the classification tasks of single images. However, in clinical practice, dermatologists also use other patient data beyond the visual aspects present in a digitized image, further increasing their diagnostic accuracy. Several pilot studies have recently investigated the effects of integrating different subtypes of patient data into CNN-based skin cancer classifiers.
Objective
This systematic review focuses on the current research investigating the impact of merging information from image features and patient data on the performance of CNN-based skin cancer image classification. This study aims to explore the potential in this field of research by evaluating the types of patient data used, the ways in which the nonimage data are encoded and merged with the image features, and the impact of the integration on the classifier performance.
Methods
Google Scholar, PubMed, MEDLINE, and ScienceDirect were screened for peer-reviewed studies published in English that dealt with the integration of patient data within a CNN-based skin cancer classification. The search terms skin cancer classification, convolutional neural network(s), deep learning, lesions, melanoma, metadata, clinical information, and patient data were combined.
Results
A total of 11 publications fulfilled the inclusion criteria. All of them reported an overall improvement in different skin lesion classification tasks with patient data integration. The most commonly used patient data were age, sex, and lesion location. The patient data were mostly one-hot encoded. There were differences in the complexity that the encoded patient data were processed with regarding deep learning methods before and after fusing them with the image features for a combined classifier.
Conclusions
This study indicates the potential benefits of integrating patient data into CNN-based diagnostic algorithms. However, how exactly the individual patient data enhance classification performance, especially in the case of multiclass classification problems, is still unclear. Moreover, a substantial fraction of patient data used by dermatologists remains to be analyzed in the context of CNN-based skin cancer classification. Further exploratory analyses in this promising field may optimize patient data integration into CNN-based skin cancer diagnostics for patients’ benefits.
Background
Deep neural networks are showing impressive results in different medical image classification tasks. However, for real-world applications, there is a need to estimate the network’s ...uncertainty together with its prediction.
Objective
In this review, we investigate in what form uncertainty estimation has been applied to the task of medical image classification. We also investigate which metrics are used to describe the effectiveness of the applied uncertainty estimation
Methods
Google Scholar, PubMed, IEEE Xplore, and ScienceDirect were screened for peer-reviewed studies, published between 2016 and 2021, that deal with uncertainty estimation in medical image classification. The search terms “uncertainty,” “uncertainty estimation,” “network calibration,” and “out-of-distribution detection” were used in combination with the terms “medical images,” “medical image analysis,” and “medical image classification.”
Results
A total of 22 papers were chosen for detailed analysis through the systematic review process. This paper provides a table for a systematic comparison of the included works with respect to the applied method for estimating the uncertainty.
Conclusions
The applied methods for estimating uncertainties are diverse, but the sampling-based methods Monte-Carlo Dropout and Deep Ensembles are used most frequently. We concluded that future works can investigate the benefits of uncertainty estimation in collaborative settings of artificial intelligence systems and human experts.
International Registered Report Identifier (IRRID)
RR2-10.2196/11936
Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize ...non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects on classifier performance are currently unknown. Here, we systematically examine the effects of label noise by training and evaluating convolutional neural networks (CNN) with 804 images of melanoma and nevi labeled either by dermatologists or by biopsy. The CNNs are evaluated on a test set of 384 images by means of 4-fold cross validation comparing the outputs with either the corresponding dermatological or the biopsy-verified diagnosis. With identical ground truths of training and test labels, high accuracies with 75.03% (95% CI: 74.39-75.66%) for dermatological and 73.80% (95% CI: 73.10-74.51%) for biopsy-verified labels can be achieved. However, if the CNN is trained and tested with different ground truths, accuracy drops significantly to 64.53% (95% CI: 63.12-65.94%,
< 0.01) on a non-biopsy-verified and to 64.24% (95% CI: 62.66-65.83%,
< 0.01) on a biopsy-verified test set. In conclusion, deep learning methods for skin cancer classification are highly sensitive to label noise and future work should use biopsy-verified training images to mitigate this problem.
State-of-the-art classifiers based on convolutional neural networks (CNNs) were shown to classify images of skin cancer on par with dermatologists and could enable lifesaving and fast diagnoses, even ...outside the hospital via installation of apps on mobile devices. To our knowledge, at present there is no review of the current work in this research area.
This study presents the first systematic review of the state-of-the-art research on classifying skin lesions with CNNs. We limit our review to skin lesion classifiers. In particular, methods that apply a CNN only for segmentation or for the classification of dermoscopic patterns are not considered here. Furthermore, this study discusses why the comparability of the presented procedures is very difficult and which challenges must be addressed in the future.
We searched the Google Scholar, PubMed, Medline, ScienceDirect, and Web of Science databases for systematic reviews and original research articles published in English. Only papers that reported sufficient scientific proceedings are included in this review.
We found 13 papers that classified skin lesions using CNNs. In principle, classification methods can be differentiated according to three principles. Approaches that use a CNN already trained by means of another large dataset and then optimize its parameters to the classification of skin lesions are the most common ones used and they display the best performance with the currently available limited datasets.
CNNs display a high performance as state-of-the-art skin lesion classifiers. Unfortunately, it is difficult to compare different classification methods because some approaches use nonpublic datasets for training and/or testing, thereby making reproducibility difficult. Future publications should use publicly available benchmarks and fully disclose methods used for training to allow comparability.
Melanoma is the most dangerous type of skin cancer but is curable if detected early. Recent publications demonstrated that artificial intelligence is capable in classifying images of benign nevi and ...melanoma with dermatologist-level precision. However, a statistically significant improvement compared with dermatologist classification has not been reported to date.
For this comparative study, 4204 biopsy-proven images of melanoma and nevi (1:1) were used for the training of a convolutional neural network (CNN). New techniques of deep learning were integrated. For the experiment, an additional 804 biopsy-proven dermoscopic images of melanoma and nevi (1:1) were randomly presented to dermatologists of nine German university hospitals, who evaluated the quality of each image and stated their recommended treatment (19,296 recommendations in total). Three McNemar's tests comparing the results of the CNN's test runs in terms of sensitivity, specificity and overall correctness were predefined as the main outcomes.
The respective sensitivity and specificity of lesion classification by the dermatologists were 67.2% (95% confidence interval CI: 62.6%–71.7%) and 62.2% (95% CI: 57.6%–66.9%). In comparison, the trained CNN achieved a higher sensitivity of 82.3% (95% CI: 78.3%–85.7%) and a higher specificity of 77.9% (95% CI: 73.8%–81.8%). The three McNemar's tests in 2 × 2 tables all reached a significance level of p < 0.001. This significance level was sustained for both subgroups.
For the first time, automated dermoscopic melanoma image classification was shown to be significantly superior to both junior and board-certified dermatologists (p < 0.001).
•Recent publications demonstrated that deep learning is capable to classify images of benign nevi and melanoma with dermatologist-level precision.•A systematic outperformance of dermatologists was not demonstrated to date.•This study shows the first systematic (p < 0.001) outperformance of board-certified dermatologists in dermoscopic melanoma image classification.