The widespread use of immune checkpoint inhibitors (ICIs) has revolutionised treatment of multiple cancer types. However, selecting patients who may benefit from ICI remains challenging. Artificial ...intelligence (AI) approaches allow exploitation of high-dimension oncological data in research and development of precision immuno-oncology.
We conducted a systematic literature review of peer-reviewed original articles studying the ICI efficacy prediction in cancer patients across five data modalities: genomics (including genomics, transcriptomics, and epigenomics), radiomics, digital pathology (pathomics), and real-world and multimodality data.
A total of 90 studies were included in this systematic review, with 80% published in 2021-2022. Among them, 37 studies included genomic, 20 radiomic, 8 pathomic, 20 real-world, and 5 multimodal data. Standard machine learning (ML) methods were used in 72% of studies, deep learning (DL) methods in 22%, and both in 6%. The most frequently studied cancer type was non-small-cell lung cancer (36%), followed by melanoma (16%), while 25% included pan-cancer studies. No prospective study design incorporated AI-based methodologies from the outset; rather, all implemented AI as a post hoc analysis. Novel biomarkers for ICI in radiomics and pathomics were identified using AI approaches, and molecular biomarkers have expanded past genomics into transcriptomics and epigenomics. Finally, complex algorithms and new types of AI-based markers, such as meta-biomarkers, are emerging by integrating multimodal/multi-omics data.
AI-based methods have expanded the horizon for biomarker discovery, demonstrating the power of integrating multimodal data from existing datasets to discover new meta-biomarkers. While most of the included studies showed promise for AI-based prediction of benefit from immunotherapy, none provided high-level evidence for immediate practice change. A priori planned prospective trial designs are needed to cover all lifecycle steps of these software biomarkers, from development and validation to integration into clinical practice.
Microsatellite instability (MSI)/mismatch repair deficiency (dMMR) is a key genetic feature which should be tested in every patient with colorectal cancer (CRC) according to medical guidelines. ...Artificial intelligence (AI) methods can detect MSI/dMMR directly in routine pathology slides, but the test performance has not been systematically investigated with predefined test thresholds.
We trained and validated AI-based MSI/dMMR detectors and evaluated predefined performance metrics using nine patient cohorts of 8343 patients across different countries and ethnicities.
Classifiers achieved clinical-grade performance, yielding an area under the receiver operating curve (AUROC) of up to 0.96 without using any manual annotations. Subsequently, we show that the AI system can be applied as a rule-out test: by using cohort-specific thresholds, on average 52.73% of tumors in each surgical cohort total number of MSI/dMMR = 1020, microsatellite stable (MSS)/ proficient mismatch repair (pMMR) = 7323 patients could be identified as MSS/pMMR with a fixed sensitivity at 95%. In an additional cohort of N = 1530 (MSI/dMMR = 211, MSS/pMMR = 1319) endoscopy biopsy samples, the system achieved an AUROC of 0.89, and the cohort-specific threshold ruled out 44.12% of tumors with a fixed sensitivity at 95%. As a more robust alternative to cohort-specific thresholds, we showed that with a fixed threshold of 0.25 for all the cohorts, we can rule-out 25.51% in surgical specimens and 6.10% in biopsies.
When applied in a clinical setting, this means that the AI system can rule out MSI/dMMR in a quarter (with global thresholds) or half of all CRC patients (with local fine-tuning), thereby reducing cost and turnaround time for molecular profiling.
•We describe the largest multicentric study of AI in molecular cancer diagnostics to date.•Our open-source AI system achieves a high pre-screening performance for detection of MSI.•Our data highlight the role of large-scale validation of AI biomarkers in different geographic and ethnic populations.
For clear cell renal cell carcinoma (ccRCC) risk-dependent diagnostic and therapeutic algorithms are routinely implemented in clinical practice. Artificial intelligence-based image analysis has the ...potential to improve outcome prediction and thereby risk stratification. Thus, we investigated whether a convolutional neural network (CNN) can extract relevant image features from a representative hematoxylin and eosin-stained slide to predict 5-year overall survival (5y-OS) in ccRCC. The CNN was trained to predict 5y-OS in a binary manner using slides from TCGA and validated using an independent in-house cohort. Multivariable logistic regression was used to combine of the CNNs prediction and clinicopathological parameters. A mean balanced accuracy of 72.0% (standard deviation SD = 7.9%), sensitivity of 72.4% (SD = 10.6%), specificity of 71.7% (SD = 11.9%) and area under receiver operating characteristics curve (AUROC) of 0.75 (SD = 0.07) was achieved on the TCGA training set (n = 254 patients / WSIs) using 10-fold cross-validation. On the external validation cohort (n = 99 patients / WSIs), mean accuracy, sensitivity, specificity and AUROC were 65.5% (95%-confidence interval CI: 62.9–68.1%), 86.2% (95%-CI: 81.8–90.5%), 44.9% (95%-CI: 40.2–49.6%), and 0.70 (95%-CI: 0.69–0.71). A multivariable model including age, tumor stage and metastasis yielded an AUROC of 0.75 on the TCGA cohort. The inclusion of the CNN-based classification (Odds ratio = 4.86, 95%-CI: 2.70–8.75, p < 0.01) raised the AUROC to 0.81. On the validation cohort, both models showed an AUROC of 0.88. In univariable Cox regression, the CNN showed a hazard ratio of 3.69 (95%-CI: 2.60–5.23, p < 0.01) on TCGA and 2.13 (95%-CI: 0.92–4.94, p = 0.08) on external validation. The results demonstrate that the CNN’s image-based prediction of survival is promising and thus this widely applicable technique should be further investigated with the aim of improving existing risk stratification in ccRCC.
Over the past decade, the development of molecular high-throughput methods (omics) increased rapidly and provided new insights for cancer research. In parallel, deep learning approaches revealed the ...enormous potential for medical image analysis, especially in digital pathology. Combining image and omics data with deep learning tools may enable the discovery of new cancer biomarkers and a more precise prediction of patient prognosis. This systematic review addresses different multimodal fusion methods of convolutional neural network-based image analyses with omics data, focussing on the impact of data combination on the classification performance.
PubMed was screened for peer-reviewed articles published in English between January 2015 and June 2021 by two independent researchers. Search terms related to deep learning, digital pathology, omics, and multimodal fusion were combined.
We identified a total of 11 studies meeting the inclusion criteria, namely studies that used convolutional neural networks for haematoxylin and eosin image analysis of patients with cancer in combination with integrated omics data. Publications were categorised according to their endpoints: 7 studies focused on survival analysis and 4 studies on prediction of cancer subtypes, malignancy or microsatellite instability with spatial analysis.
Image-based classifiers already show high performances in prognostic and predictive cancer diagnostics. The integration of omics data led to improved performance in all studies described here. However, these are very early studies that still require external validation to demonstrate their generalisability and robustness. Further and more comprehensive studies with larger sample sizes are needed to evaluate performance and determine clinical benefits.
•Fusion of image and genomic data with deep learning improves performance.•Several studies showed a benefit for cancer research.•The studies are at a very early stage and require external validation.•Further studies are needed to demonstrate clinical benefits.
Large language models (LLMs) have shown potential in various applications, including clinical practice. However, their accuracy and utility in providing treatment recommendations for orthopedic ...conditions remain to be investigated. Thus, this pilot study aims to evaluate the validity of treatment recommendations generated by GPT-4 for common knee and shoulder orthopedic conditions using anonymized clinical MRI reports. A retrospective analysis was conducted using 20 anonymized clinical MRI reports, with varying severity and complexity. Treatment recommendations were elicited from GPT-4 and evaluated by two board-certified specialty-trained senior orthopedic surgeons. Their evaluation focused on semiquantitative gradings of accuracy and clinical utility and potential limitations of the LLM-generated recommendations. GPT-4 provided treatment recommendations for 20 patients (mean age, 50 years ± 19 standard deviation; 12 men) with acute and chronic knee and shoulder conditions. The LLM produced largely accurate and clinically useful recommendations. However, limited awareness of a patient's overall situation, a tendency to incorrectly appreciate treatment urgency, and largely schematic and unspecific treatment recommendations were observed and may reduce its clinical usefulness. In conclusion, LLM-based treatment recommendations are largely adequate and not prone to 'hallucinations', yet inadequate in particular situations. Critical guidance by healthcare professionals is obligatory, and independent use by patients is discouraged, given the dependency on precise data input.
Gastrointestinal cancers account for approximately 20% of all cancer diagnoses and are responsible for 22.5% of cancer deaths worldwide. Artificial intelligence–based diagnostic support systems, in ...particular convolutional neural network (CNN)–based image analysis tools, have shown great potential in medical computer vision. In this systematic review, we summarise recent studies reporting CNN-based approaches for digital biomarkers for characterization and prognostication of gastrointestinal cancer pathology.
Pubmed and Medline were screened for peer-reviewed papers dealing with CNN-based gastrointestinal cancer analyses from histological slides, published between 2015 and 2020.Seven hundred and ninety titles and abstracts were screened, and 58 full-text articles were assessed for eligibility.
Sixteen publications fulfilled our inclusion criteria dealing with tumor or precursor lesion characterization or prognostic and predictive biomarkers: 14 studies on colorectal or rectal cancer, three studies on gastric cancer and none on esophageal cancer. These studies were categorised according to their end-points: polyp characterization, tumor characterization and patient outcome. Regarding the translation into clinical practice, we identified several studies demonstrating generalization of the classifier with external tests and comparisons with pathologists, but none presenting clinical implementation.
Results of recent studies on CNN-based image analysis in gastrointestinal cancer pathology are promising, but studies were conducted in observational and retrospective settings. Large-scale trials are needed to assess performance and predict clinical usefulness. Furthermore, large-scale trials are required for approval of CNN-based prediction models as medical devices.
•Computational medical image analysis is rising in gastrointestinal cancer pathology.•Several studies deal with CNN–based identification of biomarkers on HE slides.•Results are promising but studies are at early stages.•Further studies are needed to assess performance and potential clinical usefulness.
Multiple studies have compared the performance of artificial intelligence (AI)–based models for automated skin cancer classification to human experts, thus setting the cornerstone for a successful ...translation of AI-based tools into clinicopathological practice.
The objective of the study was to systematically analyse the current state of research on reader studies involving melanoma and to assess their potential clinical relevance by evaluating three main aspects: test set characteristics (holdout/out-of-distribution data set, composition), test setting (experimental/clinical, inclusion of metadata) and representativeness of participating clinicians.
PubMed, Medline and ScienceDirect were screened for peer-reviewed studies published between 2017 and 2021 and dealing with AI-based skin cancer classification involving melanoma. The search terms skin cancer classification, deep learning, convolutional neural network (CNN), melanoma (detection), digital biomarkers, histopathology and whole slide imaging were combined. Based on the search results, only studies that considered direct comparison of AI results with clinicians and had a diagnostic classification as their main objective were included.
A total of 19 reader studies fulfilled the inclusion criteria. Of these, 11 CNN-based approaches addressed the classification of dermoscopic images; 6 concentrated on the classification of clinical images, whereas 2 dermatopathological studies utilised digitised histopathological whole slide images.
All 19 included studies demonstrated superior or at least equivalent performance of CNN-based classifiers compared with clinicians. However, almost all studies were conducted in highly artificial settings based exclusively on single images of the suspicious lesions. Moreover, test sets mainly consisted of holdout images and did not represent the full range of patient populations and melanoma subtypes encountered in clinical practice.
•We analysed studies comparing artificial intelligence (AI)–based skin cancer classifiers with clinicians.•All 19 included studies showed superior or equivalent performance of AI.•Studies were often carried out in artificial settings, with little external testing.•To enhance clinical relevance of reader studies, less artificial settings are vital.•Future comparisons should be validated with the use of external test sets.
Clinicians and pathologists traditionally use patient data in addition to clinical examination to support their diagnoses.
We investigated whether a combination of histologic whole slides image (WSI) ...analysis based on convolutional neural networks (CNNs) and commonly available patient data (age, sex and anatomical site of the lesion) in a binary melanoma/nevus classification task could increase the performance compared with CNNs alone.
We used 431 WSIs from two different laboratories and analysed the performance of classifiers that used the image or patient data individually or three common fusion techniques. Furthermore, we tested a naive combination of patient data and an image classifier: for cases interpreted as ‘uncertain’ (CNN output score <0.7), the decision of the CNN was replaced by the decision of the patient data classifier.
The CNN on its own achieved the best performance (mean ± standard deviation of five individual runs) with AUROC of 92.30% ± 0.23% and balanced accuracy of 83.17% ± 0.38%. While the classification performance was not significantly improved in general by any of the tested fusions, naive strategy of replacing the image classifier with the patient data classifier on slides with low output scores improved balanced accuracy to 86.72% ± 0.36%.
In most cases, the CNN on its own was so accurate that patient data integration did not provide any benefit. However, incorporating patient data for lesions that were classified by the CNN with low ‘confidence’ improved balanced accuracy.
•Pathologists incorporate patient data in addition to clinical examination.•They put more emphasis on patient data if they are uncertain.•We investigated fusing histologic image/patient data within CNN-based classifiers.•State-of-the-art fusing approaches in general did not yield a performance benefit.•Mimicking humans by fusing patient data only if CNN was uncertain raised accuracy.