With the advent of array-based techniques to measure methylation levels in primary tumor samples, systematic investigations of methylomes have widely been performed on a large number of tumor ...entities. Most of these approaches are not based on measuring individual cell methylation but rather the bulk tumor sample DNA, which contains a mixture of tumor cells, infiltrating immune cells and other stromal components. This raises questions about the purity of a certain tumor sample, given the varying degrees of stromal infiltration in different entities. Previous methods to infer tumor purity require or are based on the use of matching control samples which are rarely available. Here we present a novel, reference free method to quantify tumor purity, based on two Random Forest classifiers, which were trained on ABSOLUTE as well as ESTIMATE purity values from TCGA tumor samples. We subsequently apply this method to a previously published, large dataset of brain tumors, proving that these models perform well in datasets that have not been characterized with respect to tumor purity .
Using two gold standard methods to infer purity - the ABSOLUTE score based on whole genome sequencing data and the ESTIMATE score based on gene expression data- we have optimized Random Forest classifiers to predict tumor purity in entities that were contained in the TCGA project. We validated these classifiers using an independent test data set and cross-compared it to other methods which have been applied to the TCGA datasets (such as ESTIMATE and LUMP). Using Illumina methylation array data of brain tumor entities (as published in Capper et al. (Nature 555:469-474,2018)) we applied this model to estimate tumor purity and find that subgroups of brain tumors display substantial differences in tumor purity.
Random forest- based tumor purity prediction is a well suited tool to extrapolate gold standard measures of purity to novel methylation array datasets. In contrast to other available methylation based tumor purity estimation methods, our classifiers do not need a priori knowledge about the tumor entity or matching control tissue to predict tumor purity.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Recently, we described a machine learning approach for classification of central nervous system tumors based on the analysis of genome-wide DNA methylation patterns
6
. Here, we report on DNA ...methylation-based central nervous system (CNS) tumor diagnostics conducted in our institution between the years 2015 and 2018. In this period, more than 1000 tumors from the neurosurgical departments in Heidelberg and Mannheim and more than 1000 tumors referred from external institutions were subjected to DNA methylation analysis for diagnostic purposes. We describe our current approach to the integrated diagnosis of CNS tumors with a focus on constellations with conflicts between morphological and molecular genetic findings. We further describe the benefit of integrating DNA copy-number alterations into diagnostic considerations and provide a catalog of copy-number changes for individual DNA methylation classes. We also point to several pitfalls accompanying the diagnostic implementation of DNA methylation profiling and give practical suggestions for recurring diagnostic scenarios.
Summary Background The WHO classification of brain tumours describes 15 subtypes of meningioma. Nine of these subtypes are allotted to WHO grade I, and three each to grade II and grade III. Grading ...is based solely on histology, with an absence of molecular markers. Although the existing classification and grading approach is of prognostic value, it harbours shortcomings such as ill-defined parameters for subtypes and grading criteria prone to arbitrary judgment. In this study, we aimed for a comprehensive characterisation of the entire molecular genetic landscape of meningioma to identify biologically and clinically relevant subgroups. Methods In this multicentre, retrospective analysis, we investigated genome-wide DNA methylation patterns of meningiomas from ten European academic neuro-oncology centres to identify distinct methylation classes of meningiomas. The methylation classes were further characterised by DNA copy number analysis, mutational profiling, and RNA sequencing. Methylation classes were analysed for progression-free survival outcomes by the Kaplan-Meier method. The DNA methylation-based and WHO classification schema were compared using the Brier prediction score, analysed in an independent cohort with WHO grading, progression-free survival, and disease-specific survival data available, collected at the Medical University Vienna (Vienna, Austria), assessing methylation patterns with an alternative methylation chip. Findings We retrospectively collected 497 meningiomas along with 309 samples of other extra-axial skull tumours that might histologically mimic meningioma variants. Unsupervised clustering of DNA methylation data clearly segregated all meningiomas from other skull tumours. We generated genome-wide DNA methylation profiles from all 497 meningioma samples. DNA methylation profiling distinguished six distinct clinically relevant methylation classes associated with typical mutational, cytogenetic, and gene expression patterns. Compared with WHO grading, classification by individual and combined methylation classes more accurately identifies patients at high risk of disease progression in tumours with WHO grade I histology, and patients at lower risk of recurrence among WHO grade II tumours (p=0·0096) from the Brier prediction test). We validated this finding in our independent cohort of 140 patients with meningioma. Interpretation DNA methylation-based meningioma classification captures clinically more homogenous groups and has a higher power for predicting tumour recurrence and prognosis than the WHO classification. The approach presented here is potentially very useful for stratifying meningioma patients to observation-only or adjuvant treatment groups. We consider methylation-based tumour classification highly relevant for the future diagnosis and treatment of meningioma. Funding German Cancer Aid, Else Kröner-Fresenius Foundation, and DKFZ/Heidelberg Institute of Personalized Oncology/Precision Oncology Program.
Purpose To evaluate the association of multiparametric and multiregional magnetic resonance (MR) imaging features with key molecular characteristics in patients with newly diagnosed glioblastoma. ...Materials and Methods Retrospective data evaluation was approved by the local ethics committee, and the requirement to obtain informed consent was waived. Preoperative MR imaging features were correlated with key molecular characteristics within a single-institution cohort of 152 patients with newly diagnosed glioblastoma. Preoperative MR imaging features (n = 31) included multiparametric (anatomic and diffusion-, perfusion-, and susceptibility-weighted images) and multiregional (contrast-enhancing regions and hyperintense regions at nonenhanced fluid-attenuated inversion recovery imaging) information with histogram quantification of tumor volumes, volume ratios, apparent diffusion coefficients, cerebral blood flow, cerebral blood volume, and intratumoral susceptibility signals. Molecular characteristics determined included global DNA methylation subgroups (eg, mesenchymal, RTK I "PGFRA," RTK II "classic"), MGMT promoter methylation status, and hallmark copy number variations (EGFR, PDGFRA, MDM4, and CDK4 amplification; PTEN, CDKN2A, NF1, and RB1 loss). Univariate analyses (voxel-lesion symptom mapping for tumor location, Wilcoxon test for all other MR imaging features) and machine learning models were applied to study the strength of association and discriminative value of MR imaging features for predicting underlying molecular characteristics. Results There was no tumor location predilection for any of the assessed molecular parameters (permutation-adjusted P > .05). Univariate imaging parameter associations were noted for EGFR amplification and CDKN2A loss, with both demonstrating increased Gaussian-normalized relative cerebral blood volume and Gaussian-normalized relative cerebral blood flow values (area under the receiver operating characteristics curve: 63%-69%, false discovery rate-adjusted P < .05). Subjecting all MR imaging features to machine learning-based classification enabled prediction of EGFR amplification status and the RTK II glioblastoma subgroup with a moderate, yet significantly greater, accuracy (63% for EGFR P < .01, 61% for RTK II P = .01) than prediction by chance; prediction accuracy for all other molecular parameters was not significant. Conclusion The authors found associations between established MR imaging features and molecular characteristics, although not of sufficient strength to enable generation of machine learning classification models for reliable and clinically meaningful prediction of molecular characteristics in patients with glioblastoma.
RSNA, 2016 Online supplemental material is available for this article.
In 2012, an international consensus paper reported that medulloblastoma comprises four molecular subgroups (WNT, SHH, Group 3, and Group 4), each associated with distinct genomic features and ...clinical behavior. Independently, multiple recent reports have defined further intra-subgroup heterogeneity in the form of biologically and clinically relevant subtypes. However, owing to differences in patient cohorts and analytical methods, estimates of subtype number and definition have been inconsistent, especially within Group 3 and Group 4. Herein, we aimed to reconcile the definition of Group 3/Group 4 MB subtypes through the analysis of a series of 1501 medulloblastomas with DNA-methylation profiling data, including 852 with matched transcriptome data. Using multiple complementary bioinformatic approaches, we compared the concordance of subtype calls between published cohorts and analytical methods, including assessments of class-definition confidence and reproducibility. While the lowest complexity solutions continued to support the original consensus subgroups of Group 3 and Group 4, our analysis most strongly supported a definition comprising eight robust Group 3/Group 4 subtypes (types I–VIII). Subtype II was consistently identified across all component studies, while all others were supported by multiple class-definition methods. Regardless of analytical technique, increasing cohort size did not further increase the number of identified Group 3/Group 4 subtypes. Summarizing the molecular and clinico-pathological features of these eight subtypes indicated enrichment of specific driver gene alterations and cytogenetic events amongst subtypes, and identified highly disparate survival outcomes, further supporting their biological and clinical relevance. Collectively, this study provides continued support for consensus Groups 3 and 4 while enabling robust derivation of, and categorical accounting for, the extensive intertumoral heterogeneity within Groups 3 and 4, revealed by recent high-resolution subclassification approaches. Furthermore, these findings provide a basis for application of emerging methods (e.g., proteomics/single-cell approaches) which may additionally inform medulloblastoma subclassification. Outputs from this study will help shape definition of the next generation of medulloblastoma clinical protocols and facilitate the application of enhanced molecularly guided risk stratification to improve outcomes and quality of life for patients and their families.
DNA methylation data-based precision cancer diagnostics is emerging as the state of the art for molecular tumor classification. Standards for choosing statistical methods with regard to ...well-calibrated probability estimates for these typically highly multiclass classification tasks are still lacking. To support this choice, we evaluated well-established machine learning (ML) classifiers including random forests (RFs), elastic net (ELNET), support vector machines (SVMs) and boosted trees in combination with post-processing algorithms and developed ML workflows that allow for unbiased class probability (CP) estimation. Calibrators included ridge-penalized multinomial logistic regression (MR) and Platt scaling by fitting logistic regression (LR) and Firth's penalized LR. We compared these workflows on a recently published brain tumor 450k DNA methylation cohort of 2,801 samples with 91 diagnostic categories using a 5 × 5-fold nested cross-validation scheme and demonstrated their generalizability on external data from The Cancer Genome Atlas. ELNET was the top stand-alone classifier with the best calibration profiles. The best overall two-stage workflow was MR-calibrated SVM with linear kernels closely followed by ridge-calibrated tuned RF. For calibration, MR was the most effective regardless of the primary classifier. The protocols developed as a result of these comparisons provide valuable guidance on choosing ML workflows and their tuning to generate well-calibrated CP estimates for precision diagnostics using DNA methylation data. Computation times vary depending on the ML algorithm from <15 min to 5 d using multi-core desktop PCs. Detailed scripts in the open-source R language are freely available on GitHub, targeting users with intermediate experience in bioinformatics and statistics and using R with Bioconductor extensions.
The introduction of the classification of brain tumours based on their DNA methylation profile has significantly changed the diagnostic approach for cases with ambiguous histology, non-informative or ...contradictory molecular profiles or for entities where methylation profiling provides useful information for patient risk stratification, for example in medulloblastoma and ependymoma. We present our experience that combines a conventional molecular diagnostic approach with the complementary use of a DNA methylation-based classification tool, for adult brain tumours originating from local as well as national referrals. We report the frequency of IDH mutations in a large cohort of nearly 1550 patients, EGFR amplifications in almost 1900 IDH-wildtype glioblastomas, and histone mutations in 70 adult gliomas. We demonstrate how additional methylation-based classification has changed and improved our diagnostic approach. Of the 325 cases referred for methylome testing, 179 (56%) had a calibrated score of 0.84 and higher and were included in the evaluation. In these 179 samples, the diagnosis was changed in 45 (25%), refined in 86 (48%) and confirmed in 44 cases (25%). In addition, the methylation arrays contain copy number information that usefully complements the methylation profile. For example, EGFR amplification which is 95% concordant with our Real-Time PCR-based copy number assays. We propose here a diagnostic algorithm that integrates histology, conventional molecular tests and methylation arrays.
We have developed the R package c060 with the aim of improving R software func- tionality for high-dimensional risk prediction modeling, e.g., for prognostic modeling of survival data using ...high-throughput genomic data. Penalized regression models provide a statistically appealing way of building risk prediction models from high-dimensional data. The popular CRAN package glmnet implements an efficient algorithm for fitting penalized Cox and generalized linear models. However, in practical applications the data analysis will typically not stop at the point where the model has been fitted. One is for example often interested in the stability of selected features and in assessing the prediction performance of a model and we provide functions to deal with both of these tasks. Our R functions are computationally efficient and offer the possibility of speeding up computing time through parallel computing. Another feature which can drastically reduce computing time is an efficient interval-search algorithm, which we have implemented for selecting the optimal parameter combination for elastic net penalties. These functions have been useful in our daily work at the Biostatistics department (C060) of the German Cancer Research Center where prognostic modeling of patient survival data is of particular interest. Although we focus on a survival data application of penalized Cox models in this article, the functions in our R package are in general applicable to all types of regression models implemented in the glmnet package, with the exception of prediction error curves, which are specific to time-to-event data.
Principal component analysis (PCA) is a basic tool often used in bioinformatics for visualization and dimension reduction. However, it is known that PCA may not consistently estimate the true ...direction of maximal variability in high-dimensional, low sample size settings, which are typical for molecular data. Assuming that the underlying signal is sparse, i.e. that only a fraction of features contribute to a principal component (PC), this estimation consistency can be retained. Most existing sparse PCA methods use L1-penalization, i.e. the lasso, to perform feature selection. But, the lasso is known to lack variable selection consistency in high dimensions and therefore a subsequent interpretation of selected features can give misleading results.
We present S4VDPCA, a sparse PCA method that incorporates a subsampling approach, namely stability selection. S4VDPCA can consistently select the truly relevant variables contributing to a sparse PC while also consistently estimate the direction of maximal variability. The performance of the S4VDPCA is assessed in a simulation study and compared to other PCA approaches, as well as to a hypothetical oracle PCA that 'knows' the truly relevant features in advance and thus finds optimal, unbiased sparse PCs. S4VDPCA is computationally efficient and performs best in simulations regarding parameter estimation consistency and feature selection consistency. Furthermore, S4VDPCA is applied to a publicly available gene expression data set of medulloblastoma brain tumors. Features contributing to the first two estimated sparse PCs represent genes significantly over-represented in pathways typically deregulated between molecular subgroups of medulloblastoma.
Software is available at https://github.com/mwsill/s4vdpca.
m.sill@dkfz.de
Supplementary data are available at Bioinformatics online.