The diagnosis of most cancers is made by a board-certified pathologist based on a tissue biopsy under the microscope. Recent research reveals a high discordance between individual pathologists. For ...melanoma, the literature reports on 25–26% of discordance for classifying a benign nevus versus malignant melanoma. A recent study indicated the potential of deep learning to lower these discordances. However, the performance of deep learning in classifying histopathologic melanoma images was never compared directly to human experts. The aim of this study is to perform such a first direct comparison.
A total of 695 lesions were classified by an expert histopathologist in accordance with current guidelines (350 nevi/345 melanoma). Only the haematoxylin & eosin (H&E) slides of these lesions were digitalised via a slide scanner and then randomly cropped. A total of 595 of the resulting images were used to train a convolutional neural network (CNN). The additional 100 H&E image sections were used to test the results of the CNN in comparison to 11 histopathologists. Three combined McNemar tests comparing the results of the CNNs test runs in terms of sensitivity, specificity and accuracy were predefined to test for significance (p < 0.05).
The CNN achieved a mean sensitivity/specificity/accuracy of 76%/60%/68% over 11 test runs. In comparison, the 11 pathologists achieved a mean sensitivity/specificity/accuracy of 51.8%/66.5%/59.2%. Thus, the CNN was significantly (p = 0.016) superior in classifying the cropped images.
With limited image information available, a CNN was able to outperform 11 histopathologists in the classification of histopathological melanoma images and thus shows promise to assist human melanoma diagnoses.
•A convolutional neural network (CNN) was trained with 595 histopathologic images of melanoma and nevi.•In a direct comparison, the CNN and 11 histopathologists classified a test set of 100 additional histopathologic images (1:1 melanoma/nevi).•The CNN systematically outperformed the 11 histopathologists in terms of overall accuracy, sensitivity and specificity (p = 0.016).
Several recent publications have demonstrated the use of convolutional neural networks to classify images of melanoma at par with board-certified dermatologists. However, the non-availability of a ...public human benchmark restricts the comparability of the performance of these algorithms and thereby the technical progress in this field.
An electronic questionnaire was sent to dermatologists at 12 German university hospitals. Each questionnaire comprised 100 dermoscopic and 100 clinical images (80 nevi images and 20 biopsy-verified melanoma images, each), all open-source. The questionnaire recorded factors such as the years of experience in dermatology, performed skin checks, age, sex and the rank within the university hospital or the status as resident physician. For each image, the dermatologists were asked to provide a management decision (treat/biopsy lesion or reassure the patient). Main outcome measures were sensitivity, specificity and the receiver operating characteristics (ROC).
Total 157 dermatologists assessed all 100 dermoscopic images with an overall sensitivity of 74.1%, specificity of 60.0% and an ROC of 0.67 (range = 0.538–0.769); 145 dermatologists assessed all 100 clinical images with an overall sensitivity of 89.4%, specificity of 64.4% and an ROC of 0.769 (range = 0.613–0.9). Results between test-sets were significantly different (P < 0.05) confirming the need for a standardised benchmark.
We present the first public melanoma classification benchmark for both non-dermoscopic and dermoscopic images for comparing artificial intelligence algorithms with diagnostic performance of 145 or 157 dermatologists. Melanoma Classification Benchmark should be considered as a reference standard for white-skinned Western populations in the field of binary algorithmic melanoma classification.
•This paper provides the first open access melanoma classification benchmark for both non-dermoscopic and dermoscopic images.•Algorithms can now be easily compared to the performance of dermatologists in terms of sensitivity, specificity and ROC.•The melanoma benchmark allows comparability between algorithms of different publications and provides a new reference standard.
Melanoma is the most dangerous type of skin cancer but is curable if detected early. Recent publications demonstrated that artificial intelligence is capable in classifying images of benign nevi and ...melanoma with dermatologist-level precision. However, a statistically significant improvement compared with dermatologist classification has not been reported to date.
For this comparative study, 4204 biopsy-proven images of melanoma and nevi (1:1) were used for the training of a convolutional neural network (CNN). New techniques of deep learning were integrated. For the experiment, an additional 804 biopsy-proven dermoscopic images of melanoma and nevi (1:1) were randomly presented to dermatologists of nine German university hospitals, who evaluated the quality of each image and stated their recommended treatment (19,296 recommendations in total). Three McNemar's tests comparing the results of the CNN's test runs in terms of sensitivity, specificity and overall correctness were predefined as the main outcomes.
The respective sensitivity and specificity of lesion classification by the dermatologists were 67.2% (95% confidence interval CI: 62.6%–71.7%) and 62.2% (95% CI: 57.6%–66.9%). In comparison, the trained CNN achieved a higher sensitivity of 82.3% (95% CI: 78.3%–85.7%) and a higher specificity of 77.9% (95% CI: 73.8%–81.8%). The three McNemar's tests in 2 × 2 tables all reached a significance level of p < 0.001. This significance level was sustained for both subgroups.
For the first time, automated dermoscopic melanoma image classification was shown to be significantly superior to both junior and board-certified dermatologists (p < 0.001).
•Recent publications demonstrated that deep learning is capable to classify images of benign nevi and melanoma with dermatologist-level precision.•A systematic outperformance of dermatologists was not demonstrated to date.•This study shows the first systematic (p < 0.001) outperformance of board-certified dermatologists in dermoscopic melanoma image classification.
Gastrointestinal cancers account for approximately 20% of all cancer diagnoses and are responsible for 22.5% of cancer deaths worldwide. Artificial intelligence–based diagnostic support systems, in ...particular convolutional neural network (CNN)–based image analysis tools, have shown great potential in medical computer vision. In this systematic review, we summarise recent studies reporting CNN-based approaches for digital biomarkers for characterization and prognostication of gastrointestinal cancer pathology.
Pubmed and Medline were screened for peer-reviewed papers dealing with CNN-based gastrointestinal cancer analyses from histological slides, published between 2015 and 2020.Seven hundred and ninety titles and abstracts were screened, and 58 full-text articles were assessed for eligibility.
Sixteen publications fulfilled our inclusion criteria dealing with tumor or precursor lesion characterization or prognostic and predictive biomarkers: 14 studies on colorectal or rectal cancer, three studies on gastric cancer and none on esophageal cancer. These studies were categorised according to their end-points: polyp characterization, tumor characterization and patient outcome. Regarding the translation into clinical practice, we identified several studies demonstrating generalization of the classifier with external tests and comparisons with pathologists, but none presenting clinical implementation.
Results of recent studies on CNN-based image analysis in gastrointestinal cancer pathology are promising, but studies were conducted in observational and retrospective settings. Large-scale trials are needed to assess performance and predict clinical usefulness. Furthermore, large-scale trials are required for approval of CNN-based prediction models as medical devices.
•Computational medical image analysis is rising in gastrointestinal cancer pathology.•Several studies deal with CNN–based identification of biomarkers on HE slides.•Results are promising but studies are at early stages.•Further studies are needed to assess performance and potential clinical usefulness.
For clear cell renal cell carcinoma (ccRCC) risk-dependent diagnostic and therapeutic algorithms are routinely implemented in clinical practice. Artificial intelligence-based image analysis has the ...potential to improve outcome prediction and thereby risk stratification. Thus, we investigated whether a convolutional neural network (CNN) can extract relevant image features from a representative hematoxylin and eosin-stained slide to predict 5-year overall survival (5y-OS) in ccRCC. The CNN was trained to predict 5y-OS in a binary manner using slides from TCGA and validated using an independent in-house cohort. Multivariable logistic regression was used to combine of the CNNs prediction and clinicopathological parameters. A mean balanced accuracy of 72.0% (standard deviation SD = 7.9%), sensitivity of 72.4% (SD = 10.6%), specificity of 71.7% (SD = 11.9%) and area under receiver operating characteristics curve (AUROC) of 0.75 (SD = 0.07) was achieved on the TCGA training set (n = 254 patients / WSIs) using 10-fold cross-validation. On the external validation cohort (n = 99 patients / WSIs), mean accuracy, sensitivity, specificity and AUROC were 65.5% (95%-confidence interval CI: 62.9–68.1%), 86.2% (95%-CI: 81.8–90.5%), 44.9% (95%-CI: 40.2–49.6%), and 0.70 (95%-CI: 0.69–0.71). A multivariable model including age, tumor stage and metastasis yielded an AUROC of 0.75 on the TCGA cohort. The inclusion of the CNN-based classification (Odds ratio = 4.86, 95%-CI: 2.70–8.75, p < 0.01) raised the AUROC to 0.81. On the validation cohort, both models showed an AUROC of 0.88. In univariable Cox regression, the CNN showed a hazard ratio of 3.69 (95%-CI: 2.60–5.23, p < 0.01) on TCGA and 2.13 (95%-CI: 0.92–4.94, p = 0.08) on external validation. The results demonstrate that the CNN’s image-based prediction of survival is promising and thus this widely applicable technique should be further investigated with the aim of improving existing risk stratification in ccRCC.
BackgroundMyeloid-derived suppressor cells (MDSC) play a major role in the immunosuppressive melanoma microenvironment. They are generated under chronic inflammatory conditions characterized by the ...constant production of inflammatory cytokines, chemokines and growth factors, including IL-6. Recruitment of MDSC to the tumor is mediated by the interaction between chemokines and chemokine receptors, in particular C–C chemokine receptor (CCR)5. Here, we studied the mechanisms of CCR5 upregulation and increased immunosuppressive function of CCR5+ MDSC.MethodsThe immortalized myeloid suppressor cell line MSC-2, primary immature myeloid cells and in vitro differentiated MDSC were used to determine factors and molecular mechanisms regulating CCR5 expression and immunosuppressive markers at the mRNA and protein levels. The relevance of the identified pathways was validated on the RET transgenic mouse melanoma model, which was also used to target the identified pathways in vivo.ResultsIL-6 upregulated the expression of CCR5 and arginase 1 in MDSC by a STAT3-dependent mechanism. MDSC differentiated in the presence of IL-6 strongly inhibited CD8+ T cell functions compared with MDSC differentiated without IL-6. A correlation between IL-6 levels, phosphorylated STAT3 and CCR5 expression in tumor-infiltrating MDSC was demonstrated in the RET transgenic melanoma mouse model. Surprisingly, IL-6 overexpressing tumors grew significantly slower in mice accompanied by CD8+ T cell activation. Moreover, transgenic melanoma-bearing mice treated with IL-6 blocking antibodies showed significantly accelerated tumor development.ConclusionOur in vitro and ex vivo findings demonstrated that IL-6 induced CCR5 expression and a strong immunosuppressive activity of MDSC, highlighting this cytokine as a promising target for melanoma immunotherapy. However, IL-6 blocking therapy did not prove to be effective in RET transgenic melanoma-bearing mice but rather aggravated tumor progression. Further studies are needed to identify particular combination therapies, cancer entities or patient subsets to benefit from the anti-IL-6 treatment.
Abstract Background Anti-programmed cell death 1 (PD-1) antibodies represent an effective treatment option for metastatic melanoma and other cancer entities. They act via blockade of the PD-1 ...receptor, an inhibitor of the T-cell effector mechanisms that limit immune responses against tumours. As reported for ipilimumab, the anti-PD-1 antibodies pembrolizumab and nivolumab can induce immune-related adverse events (irAEs). These side-effects can involve skin, gastrointestinal tract, liver, the endocrine system and other organ systems. Since life-threatening and fatal irAEs have been reported, adequate diagnosis and management are essential. Methods and findings In total, 496 patients with metastatic melanoma from 15 skin cancer centres were treated with pembrolizumab or nivolumab. Two hundred forty two side-effects in 138 patients have been analysed. In 77 of the 138 patients side-effects affected the nervous system, respiratory tract, musculoskeletal system, heart, blood and eyes. Not yet reported side-effects such as meningo-(radiculitis), polyradiculitis, cardiac arrhythmia, asystolia, and paresis have been observed. Rare and difficult to manage side-effects such as myasthenia gravis are described in detail. Conclusion Anti-PD-1 antibodies can induce a plethora of irAEs. The knowledge of them will allow prompt diagnosis and improve the management resulting in decreased morbidity.
Due to their ability to solve complex problems, deep neural networks (DNNs) are becoming increasingly popular in medical applications. However, decision-making by such algorithms is essentially a ...black-box process that renders it difficult for physicians to judge whether the decisions are reliable. The use of explainable artificial intelligence (XAI) is often suggested as a solution to this problem.
We investigate how XAI is used for skin cancer detection: how is it used during the development of new DNNs? What kinds of visualisations are commonly used? Are there systematic evaluations of XAI with dermatologists or dermatopathologists?
Google Scholar, PubMed, IEEE Explore, Science Direct and Scopus were searched for peer-reviewed studies published between January 2017 and October 2021 applying XAI to dermatological images: the search terms histopathological image, whole-slide image, clinical image, dermoscopic image, skin, dermatology, explainable, interpretable and XAI were used in various combinations. Only studies concerned with skin cancer were included.
37 publications fulfilled our inclusion criteria. Most studies (19/37) simply applied existing XAI methods to their classifier to interpret its decision-making. Some studies (4/37) proposed new XAI methods or improved upon existing techniques. 14/37 studies addressed specific questions such as bias detection and impact of XAI on man-machine-interactions. However, only three of them evaluated the performance and confidence of humans using CAD systems with XAI.
XAI is commonly applied during the development of DNNs for skin cancer detection. However, a systematic and rigorous evaluation of its usefulness in this scenario is lacking.
•No evaluation of explainable artificial intelligence (XAI) for skin cancer detection has been conducted to this date.•Overview of 37 studies using XAI on dermatological and dermato histological data.•Analysis of the usage of XAI to inform research on its role as part of computer-aided diagnosis (CAD) systems.
Abstract Background The anti-programmed cell death-1 (PD-1) inhibitors pembrolizumab and nivolumab alone or in combination with ipilimumab have shown improved objective response rates and ...progression-free survival compared to ipilimumab only in advanced melanoma patients. Anti-PD-1 therapy demonstrated nearly equal clinical efficacy in patients who had progressed after ipilimumab or were treatment-naïve. However, only limited evidence exists regarding the efficacy of ipilimumab alone or in combination with nivolumab after treatment failure to anti-PD-therapy. Patients and methods A multicenter retrospective study in advanced melanoma patients who were treated with nivolumab (1 or 3 mg/kg) and ipilimumab (1 mg or 3 mg/kg) or ipilimumab (3 mg/kg) alone after treatment failure to anti-PD-1 therapy was performed. Patient, tumour, pre- and post-treatment characteristics were analysed. Results In total, 47 patients were treated with ipilimumab (ipi-group) and 37 patients with ipilimumab and nivolumab (combination-group) after treatment failure to anti-PD-1 therapy. Overall response rates for the ipi- and the combination-group were 16% and 21%, respectively. Disease control rate was 42% for the ipi-group and 33% for the combination-group. One-year overall survival rates for the ipi- and the combination-group were 54% and 55%, respectively. Conclusions Ipilimumab should be considered as a viable treatment option for patients with failure to prior anti-PD-1 therapy, including those with progressive disease as best response to prior anti-PD-1. In contrast, the combination of ipilimumab and nivolumab appears significantly less effective in this setting compared to treatment-naïve patients.
Clinicians and pathologists traditionally use patient data in addition to clinical examination to support their diagnoses.
We investigated whether a combination of histologic whole slides image (WSI) ...analysis based on convolutional neural networks (CNNs) and commonly available patient data (age, sex and anatomical site of the lesion) in a binary melanoma/nevus classification task could increase the performance compared with CNNs alone.
We used 431 WSIs from two different laboratories and analysed the performance of classifiers that used the image or patient data individually or three common fusion techniques. Furthermore, we tested a naive combination of patient data and an image classifier: for cases interpreted as ‘uncertain’ (CNN output score <0.7), the decision of the CNN was replaced by the decision of the patient data classifier.
The CNN on its own achieved the best performance (mean ± standard deviation of five individual runs) with AUROC of 92.30% ± 0.23% and balanced accuracy of 83.17% ± 0.38%. While the classification performance was not significantly improved in general by any of the tested fusions, naive strategy of replacing the image classifier with the patient data classifier on slides with low output scores improved balanced accuracy to 86.72% ± 0.36%.
In most cases, the CNN on its own was so accurate that patient data integration did not provide any benefit. However, incorporating patient data for lesions that were classified by the CNN with low ‘confidence’ improved balanced accuracy.
•Pathologists incorporate patient data in addition to clinical examination.•They put more emphasis on patient data if they are uncertain.•We investigated fusing histologic image/patient data within CNN-based classifiers.•State-of-the-art fusing approaches in general did not yield a performance benefit.•Mimicking humans by fusing patient data only if CNN was uncertain raised accuracy.