DNA methylation data-based precision cancer diagnostics is emerging as the state of the art for molecular tumor classification. Standards for choosing statistical methods with regard to ...well-calibrated probability estimates for these typically highly multiclass classification tasks are still lacking. To support this choice, we evaluated well-established machine learning (ML) classifiers including random forests (RFs), elastic net (ELNET), support vector machines (SVMs) and boosted trees in combination with post-processing algorithms and developed ML workflows that allow for unbiased class probability (CP) estimation. Calibrators included ridge-penalized multinomial logistic regression (MR) and Platt scaling by fitting logistic regression (LR) and Firth's penalized LR. We compared these workflows on a recently published brain tumor 450k DNA methylation cohort of 2,801 samples with 91 diagnostic categories using a 5 × 5-fold nested cross-validation scheme and demonstrated their generalizability on external data from The Cancer Genome Atlas. ELNET was the top stand-alone classifier with the best calibration profiles. The best overall two-stage workflow was MR-calibrated SVM with linear kernels closely followed by ridge-calibrated tuned RF. For calibration, MR was the most effective regardless of the primary classifier. The protocols developed as a result of these comparisons provide valuable guidance on choosing ML workflows and their tuning to generate well-calibrated CP estimates for precision diagnostics using DNA methylation data. Computation times vary depending on the ML algorithm from <15 min to 5 d using multi-core desktop PCs. Detailed scripts in the open-source R language are freely available on GitHub, targeting users with intermediate experience in bioinformatics and statistics and using R with Bioconductor extensions.
Medulloblastoma, a malignant brain tumour primarily diagnosed during childhood, has recently been the focus of intensive molecular profiling efforts, profoundly advancing our understanding of ...biologically and clinically heterogeneous disease subgroups. Genomic, epigenomic, transcriptomic and proteomic landscapes have now been mapped for an unprecedented number of bulk samples from patients with medulloblastoma and, more recently, for single medulloblastoma cells. These efforts have provided pivotal new insights into the diverse molecular mechanisms presumed to drive tumour initiation, maintenance and recurrence across individual subgroups and subtypes. Translational opportunities stemming from this knowledge are continuing to evolve, providing a framework for improved diagnostic and therapeutic interventions. In this Review, we summarize recent advances derived from this continued molecular characterization of medulloblastoma and contextualize this progress towards the deployment of more effective, molecularly informed treatments for affected patients.
Epigenetic modifications such as carbon 5 methylation of the cytosine base in a CpG dinucleotide context are involved in the onset and progression of human diseases. A comprehensive understanding of ...the role of genome-wide DNA methylation patterns, the methylome, requires quantitative determination of the methylation states of all CpG sites in a genome. So far, analyses of the complete methylome by whole-genome bisulfite sequencing (WGBS) are rare because of the required large DNA quantities, substantial bioinformatic resources and high sequencing costs. Here we describe a detailed protocol for tagmentation-based WGBS (T-WGBS) and demonstrate its reliability in comparison with conventional WGBS. In T-WGBS, a hyperactive Tn5 transposase fragments the DNA and appends sequencing adapters in a single step. T-WGBS requires not more than 20 ng of input DNA; hence, the protocol allows the comprehensive methylome analysis of limited amounts of DNA isolated from precious biological specimens. The T-WGBS library preparation takes 2 d.
Tumors can evolve and adapt to therapeutic pressure by acquiring genetic and epigenetic alterations that may be transient or stable. A precise understanding of how such events contribute to ...intratumoral heterogeneity, dynamic subpopulations, and overall tumor fitness will require experimental approaches to prospectively label, track, and characterize resistant or otherwise adaptive populations at the single-cell level. In glioblastoma, poor efficacy of receptor tyrosine kinase (RTK) therapies has been alternatively ascribed to genetic heterogeneity or to epigenetic transitions that circumvent signaling blockade.
We combine cell lineage barcoding and single-cell transcriptomics to trace the emergence of drug resistance in stem-like glioblastoma cells treated with RTK inhibitors. Whereas a broad variety of barcoded lineages adopt a Notch-dependent persister phenotype that sustains them through early drug exposure, rare subclones acquire genetic changes that enable their rapid outgrowth over time. Single-cell analyses reveal that these genetic subclones gain copy number amplifications of the insulin receptor substrate-1 and substrate-2 (IRS1 or IRS2) loci, which activate insulin and AKT signaling programs. Persister-like cells and genomic amplifications of IRS2 and other loci are evident in primary glioblastomas and may underlie the inefficacy of targeted therapies in this disease.
A method for combined lineage tracing and scRNA-seq reveals the interplay between complementary genetic and epigenetic mechanisms of resistance in a heterogeneous glioblastoma tumor model.
Recently, we described a machine learning approach for classification of central nervous system tumors based on the analysis of genome-wide DNA methylation patterns
6
. Here, we report on DNA ...methylation-based central nervous system (CNS) tumor diagnostics conducted in our institution between the years 2015 and 2018. In this period, more than 1000 tumors from the neurosurgical departments in Heidelberg and Mannheim and more than 1000 tumors referred from external institutions were subjected to DNA methylation analysis for diagnostic purposes. We describe our current approach to the integrated diagnosis of CNS tumors with a focus on constellations with conflicts between morphological and molecular genetic findings. We further describe the benefit of integrating DNA copy-number alterations into diagnostic considerations and provide a catalog of copy-number changes for individual DNA methylation classes. We also point to several pitfalls accompanying the diagnostic implementation of DNA methylation profiling and give practical suggestions for recurring diagnostic scenarios.
In 2012, an international consensus paper reported that medulloblastoma comprises four molecular subgroups (WNT, SHH, Group 3, and Group 4), each associated with distinct genomic features and ...clinical behavior. Independently, multiple recent reports have defined further intra-subgroup heterogeneity in the form of biologically and clinically relevant subtypes. However, owing to differences in patient cohorts and analytical methods, estimates of subtype number and definition have been inconsistent, especially within Group 3 and Group 4. Herein, we aimed to reconcile the definition of Group 3/Group 4 MB subtypes through the analysis of a series of 1501 medulloblastomas with DNA-methylation profiling data, including 852 with matched transcriptome data. Using multiple complementary bioinformatic approaches, we compared the concordance of subtype calls between published cohorts and analytical methods, including assessments of class-definition confidence and reproducibility. While the lowest complexity solutions continued to support the original consensus subgroups of Group 3 and Group 4, our analysis most strongly supported a definition comprising eight robust Group 3/Group 4 subtypes (types I–VIII). Subtype II was consistently identified across all component studies, while all others were supported by multiple class-definition methods. Regardless of analytical technique, increasing cohort size did not further increase the number of identified Group 3/Group 4 subtypes. Summarizing the molecular and clinico-pathological features of these eight subtypes indicated enrichment of specific driver gene alterations and cytogenetic events amongst subtypes, and identified highly disparate survival outcomes, further supporting their biological and clinical relevance. Collectively, this study provides continued support for consensus Groups 3 and 4 while enabling robust derivation of, and categorical accounting for, the extensive intertumoral heterogeneity within Groups 3 and 4, revealed by recent high-resolution subclassification approaches. Furthermore, these findings provide a basis for application of emerging methods (e.g., proteomics/single-cell approaches) which may additionally inform medulloblastoma subclassification. Outputs from this study will help shape definition of the next generation of medulloblastoma clinical protocols and facilitate the application of enhanced molecularly guided risk stratification to improve outcomes and quality of life for patients and their families.
The WHO 2007 classification of tumors of the CNS distinguishes between diffuse astrocytoma WHO grade II (A II
WHO2007
) and anaplastic astrocytoma WHO grade III (AA III
WHO2007
). Patients with A II
...WHO2007
are significantly younger and survive significantly longer than those with AA III
WHO2007
. So far, classification and grading relies on morphological grounds only and does not yet take into account
IDH
status, a molecular marker of prognostic relevance. We here demonstrate that WHO 2007 grading performs poorly in predicting prognosis when applied to astrocytoma carrying
IDH
mutations. Three independent series including a total of 1360 adult diffuse astrocytic gliomas with
IDH
mutation containing 683 A II
IDHmut
, 562 AA III
IDHmut
and 115 GBM
IDHmut
have been examined for age distribution and survival. In all three series patients with A II
IDHmut
and AA III
IDHmut
were of identical age at presentation of disease (36–37 years) and the difference in survival between grades was much less (10.9 years for A II
IDHmut
, 9.3 years for AA III
IDHmut
) than that reported for A II
WHO2007
versus AA III
WHO2007
. Our analyses imply that the differences in age and survival between A II
WHO2007
and AA III
WHO2007
predominantly depend on the fraction of
IDH
-non-mutant astrocytomas in the cohort. This data poses a substantial challenge for the current practice of astrocytoma grading and risk stratification and is likely to have far-reaching consequences on the management of patients with
IDH
-mutant astrocytoma.
The biological roles of DNA methylation have been elucidated by profiling methods based on whole-genome or reduced-representation bisulfite sequencing, but these approaches do not efficiently survey ...the vast numbers of non-coding regulatory elements in mammalian genomes. Here we present an extended-representation bisulfite sequencing (XRBS) method for targeted profiling of DNA methylation. Our design strikes a balance between expanding coverage of regulatory elements and reproducibly enriching informative CpG dinucleotides in promoters, enhancers and CTCF binding sites. Barcoded DNA fragments are pooled before bisulfite conversion, allowing multiplex processing and technical consistency in low-input samples. Application of XRBS to single leukemia cells enabled us to evaluate genetic copy number variations and methylation variability across individual cells. Our analysis highlights heterochromatic H3K9me3 regions as having the highest cell-to-cell variability in their methylation, likely reflecting inherent epigenetic instability of these late-replicating regions, compounded by differences in cell cycle stages among sampled cells.
MicroRNAs (miRNA) regulate many genes critical for tumorigenesis. We profiled miRNAs from 11 normal breast tissues, 17 noninvasive, 151 invasive breast carcinomas, and 6 cell lines by ...in-house-developed barcoded Solexa sequencing. miRNAs were organized in genomic clusters representing promoter-controlled miRNA expression and sequence families representing seed sequence-dependent miRNA target regulation. Unsupervised clustering of samples by miRNA sequence families best reflected the clustering based on mRNA expression available for this sample set. Clustering and comparative analysis of miRNA read frequencies showed that normal breast samples were separated from most noninvasive ductal carcinoma in situ and invasive carcinomas by increased miR-21 (the most abundant miRNA in carcinomas) and multiple decreased miRNA families (including miR-98/let-7), with most miRNA changes apparent already in the noninvasive carcinomas. In addition, patients that went on to develop metastasis showed increased expression of mir-423, and triple-negative breast carcinomas were most distinct from other tumor subtypes due to upregulation of the mir~17-92 cluster. However, absolute miRNA levels between normal breast and carcinomas did not reveal any significant differences. We also discovered two polymorphic nucleotide variations among the more abundant miRNAs miR-181a (T19G) and miR-185 (T16G), but we did not identify nucleotide variations expected for classical tumor suppressor function associated with miRNAs. The differentiation of tumor subtypes and prediction of metastasis based on miRNA levels is statistically possible but is not driven by deregulation of abundant miRNAs, implicating far fewer miRNAs in tumorigenic processes than previously suggested.
This study aimed to prospectively evaluate clinical, histopathological and molecular variables for outcome prediction in medulloblastoma patients. Patients from the HIT2000 cooperative clinical trial ...were prospectively enrolled based on the availability of sufficient tumor material and complete clinical information. This revealed a cohort of 184 patients (median age 7.6 years), which was randomly split at a 2:1 ratio into a training (
n
= 127), and a test (
n
= 57) dataset in order to build and test a risk score for this population. Independent validation was performed in a non-overlapping cohort (
n
= 83). All samples were subjected to thorough histopathological investigation,
CTNNB1
mutation analysis, quantitative PCR, MLPA and FISH analyses for cytogenetic variables, and methylome analysis. By univariable analysis, clinical factors (M-stage), histopathological variables (large cell component, endothelial proliferation, synaptophysin pattern), and molecular features (chromosome 6q status,
MYC
amplification, subgrouping) were found to be prognostic. Molecular consensus subgrouping (WNT, SHH, Group 3, Group 4) was validated as an independent feature to stratify patients into different risk groups. When comparing methods for the identification of WNT-driven medulloblastoma, this study identified
CTNNB1
sequencing and methylation profiling to most reliably identify these patients. After removing patients with particularly favorable (
CTNNB1
mutation, extensive nodularity) or unfavorable (
MYC
amplification) markers, a risk score for the remaining “intermediate molecular risk” population dependent on age, M-stage, pattern of synaptophysin expression, and
MYCN
copy-number status was identified, with speckled synaptophysin expression indicating worse outcome. Test and independent validation of the score confirmed significant discrimination of patients by risk profile. Methylation subgrouping and
CTNNB1
mutation status represent robust tools for the risk stratification of medulloblastoma. A simple clinico-pathological risk score was identified, which was confirmed in a test set and by independent clinical validation.