The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this ...cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's research network, opportunities still exist to implement novel methods, thereby elucidating new biological pathways and diagnostic markers. However, mining the TCGA data presents several bioinformatics challenges, such as data retrieval and integration with clinical data and other molecular data types (e.g. RNA and DNA methylation). We developed an R/Bioconductor package called TCGAbiolinks to address these challenges and offer bioinformatics solutions by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies and in our own group. Using four different TCGA tumor types (Kidney, Brain, Breast and Colon) as examples, we provide case studies to illustrate examples of reproducibility, integrative analysis and utilization of different Bioconductor packages to advance and accelerate novel discoveries.
The advent of Next-Generation Sequencing (NGS) technologies has opened new perspectives in deciphering the genetic mechanisms underlying complex diseases. Nowadays, the amount of genomic data is ...massive and substantial efforts and new tools are required to unveil the information hidden in the data. The Genomic Data Commons (GDC) Data Portal is a platform that contains different genomic studies including the ones from The Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiatives, accounting for more than 40 tumor types originating from nearly 30000 patients. Such platforms, although very attractive, must make sure the stored data are easily accessible and adequately harmonized. Moreover, they have the primary focus on the data storage in a unique place, and they do not provide a comprehensive toolkit for analyses and interpretation of the data. To fulfill this urgent need, comprehensive but easily accessible computational methods for integrative analyses of genomic data that do not renounce a robust statistical and theoretical framework are required. In this context, the R/Bioconductor package TCGAbiolinks was developed, offering a variety of bioinformatics functionalities. Here we introduce new features and enhancements of TCGAbiolinks in terms of i) more accurate and flexible pipelines for differential expression analyses, ii) different methods for tumor purity estimation and filtering, iii) integration of normal samples from other platforms iv) support for other genomics datasets, exemplified here by the TARGET data. Evidence has shown that accounting for tumor purity is essential in the study of tumorigenesis, as these factors promote confounding behavior regarding differential expression analysis. With this in mind, we implemented these filtering procedures in TCGAbiolinks. Moreover, a limitation of some of the TCGA datasets is the unavailability or paucity of corresponding normal samples. We thus integrated into TCGAbiolinks the possibility to use normal samples from the Genotype-Tissue Expression (GTEx) project, which is another large-scale repository cataloging gene expression from healthy individuals. The new functionalities are available in the TCGAbiolinks version 2.8 and higher released in Bioconductor version 3.7.
Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as
The Cancer Genome Atlas (TCGA),
The Encyclopedia of DNA Elements ...(ENCODE), and
The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The
Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages:
AnnotationHub,
ChIPSeeker,
ComplexHeatmap,
pathview,
ELMER,
GAIA,
MINET,
RTCGAToolbox,
TCGAbiolinks.
Gliomas are a heterogeneous group of brain tumors with distinct biological and clinical properties. Despite advances in surgical techniques and clinical regimens, treatment of high-grade glioma ...remains challenging and carries dismal rates of therapeutic success and overall survival. Challenges include the molecular complexity of gliomas, as well as inconsistencies in histopathological grading, resulting in an inaccurate prediction of disease progression and failure in the use of standard therapy. The updated 2016 World Health Organization (WHO) classification of tumors of the central nervous system reflects a refinement of tumor diagnostics by integrating the genotypic and phenotypic features, thereby narrowing the defined subgroups. The new classification recommends molecular diagnosis of isocitrate dehydrogenase (IDH) mutational status in gliomas. IDH-mutant gliomas manifest the cytosine-phosphate-guanine (CpG) island methylator phenotype (G-CIMP). Notably, the recent identification of clinically relevant subsets of G-CIMP tumors (G-CIMP-high and G-CIMP-low) provides a further refinement in glioma classification that is independent of grade and histology. This scheme may be useful for predicting patient outcome and may be translated into effective therapeutic strategies tailored to each patient. In this review, we highlight the evolution of our understanding of the G-CIMP subsets and how recent advances in characterizing the genome and epigenome of gliomas may influence future basic and translational research.
DNA methylation has been used to identify functional changes at transcriptional enhancers and other cis-regulatory modules (CRMs) in tumors and other disease tissues. Our R/Bioconductor package ELMER ...(Enhancer Linking by Methylation/Expression Relationships) provides a systematic approach that reconstructs altered gene regulatory networks (GRNs) by combining enhancer methylation and gene expression data derived from the same sample set.
We present a completely revised version 2 of ELMER that provides numerous new features including an optional web-based interface and a new Supervised Analysis mode to use pre-defined sample groupings. We show that Supervised mode significantly increases statistical power and identifies additional GRNs and associated Master Regulators, such as SOX11 and KLF5 in Basal-like breast cancer.
ELMER v.2 is available as an R/Bioconductor package at http://bioconductor.org/packages/ELMER/.
Supplementary data are available at Bioinformatics online.
Colorectal cancer (CRC) is a heterogeneous disease in which unique subtypes are characterized by distinct genetic and epigenetic alterations. Here we performed comprehensive genome-scale DNA ...methylation profiling of 125 colorectal tumors and 29 adjacent normal tissues. We identified four DNA methylation-based subgroups of CRC using model-based cluster analyses. Each subtype shows characteristic genetic and clinical features, indicating that they represent biologically distinct subgroups. A CIMP-high (CIMP-H) subgroup, which exhibits an exceptionally high frequency of cancer-specific DNA hypermethylation, is strongly associated with MLH1 DNA hypermethylation and the BRAF(V600E) mutation. A CIMP-low (CIMP-L) subgroup is enriched for KRAS mutations and characterized by DNA hypermethylation of a subset of CIMP-H-associated markers rather than a unique group of CpG islands. Non-CIMP tumors are separated into two distinct clusters. One non-CIMP subgroup is distinguished by a significantly higher frequency of TP53 mutations and frequent occurrence in the distal colon, while the tumors that belong to the fourth group exhibit a low frequency of both cancer-specific DNA hypermethylation and gene mutations and are significantly enriched for rectal tumors. Furthermore, we identified 112 genes that were down-regulated more than twofold in CIMP-H tumors together with promoter DNA hypermethylation. These represent ∼7% of genes that acquired promoter DNA methylation in CIMP-H tumors. Intriguingly, 48/112 genes were also transcriptionally down-regulated in non-CIMP subgroups, but this was not attributable to promoter DNA hypermethylation. Together, we identified four distinct DNA methylation subgroups of CRC and provided novel insight regarding the role of CIMP-specific DNA hypermethylation in gene silencing.
Patient-based cancer models are essential tools for studying tumor biology and for the assessment of drug responses in a translational context. We report the establishment a large cohort of unique ...organoids and patient-derived orthotopic xenografts (PDOX) of various glioma subtypes, including gliomas with mutations in
IDH1
, and paired longitudinal PDOX from primary and recurrent tumors of the same patient. We show that glioma PDOXs enable long-term propagation of patient tumors and represent clinically relevant patient avatars that retain histopathological, genetic, epigenetic, and transcriptomic features of parental tumors. We find no evidence of mouse-specific clonal evolution in glioma PDOXs. Our cohort captures individual molecular genotypes for precision medicine including mutations in
IDH1
,
ATRX
,
TP53
,
MDM2/4
, amplification of
EGFR
,
PDGFRA
,
MET
,
CDK4/6
,
MDM2/4
, and deletion of
CDKN2A/B
,
PTCH
, and
PTEN
. Matched longitudinal PDOX recapitulate the limited genetic evolution of gliomas observed in patients following treatment. At the histological level, we observe increased vascularization in the rat host as compared to mice. PDOX-derived standardized glioma organoids are amenable to high-throughput drug screens that can be validated in mice. We show clinically relevant responses to temozolomide (TMZ) and to targeted treatments, such as EGFR and CDK4/6 inhibitors in (epi)genetically defined subgroups, according to
MGMT
promoter and
EGFR/CDK
status, respectively. Dianhydrogalactitol (VAL-083), a promising bifunctional alkylating agent in the current clinical trial, displayed high therapeutic efficacy, and was able to overcome TMZ resistance in glioblastoma. Our work underscores the clinical relevance of glioma organoids and PDOX models for translational research and personalized treatment studies and represents a unique publicly available resource for precision oncology.
Cancer driver gene alterations influence cancer development, occurring in oncogenes, tumor suppressors, and dual role genes. Discovering dual role cancer genes is difficult because of their elusive ...context-dependent behavior. We define oncogenic mediators as genes controlling biological processes. With them, we classify cancer driver genes, unveiling their roles in cancer mechanisms. To this end, we present Moonlight, a tool that incorporates multiple -omics data to identify critical cancer driver genes. With Moonlight, we analyze 8000+ tumor samples from 18 cancer types, discovering 3310 oncogenic mediators, 151 having dual roles. By incorporating additional data (amplification, mutation, DNA methylation, chromatin accessibility), we reveal 1000+ cancer driver genes, corroborating known molecular mechanisms. Additionally, we confirm critical cancer driver genes by analysing cell-line datasets. We discover inactivation of tumor suppressors in intron regions and that tissue type and subtype indicate dual role status. These findings help explain tumor heterogeneity and could guide therapeutic decisions.
We sought to identify susceptibility genes for high-grade serous ovarian cancer (HGSOC) by performing a transcriptome-wide association study of gene expression and splice junction usage in ...HGSOC-relevant tissue types (N = 2,169) and the largest genome-wide association study available for HGSOC (N = 13,037 cases and 40,941 controls). We identified 25 transcriptome-wide association study significant genes, 7 at the junction level only, including LRRC46 at 19q21.32, (P = 1 × 10
), CHMP4C at 8q21 (P = 2 × 10
) and a PRC1 junction at 15q26 (P = 7 × 10
). In vitro assays for CHMP4C showed that the associated variant induces allele-specific exon inclusion (P = 0.0024). Functional screens in HGSOC cell lines found evidence of essentiality for three of the new genes we identified: HAUS6, KANSL1 and PRC1, with the latter comparable to MYC. Our study implicates at least one target gene for 6 out of 13 distinct genome-wide association study regions, identifying 23 new candidate susceptibility genes for HGSOC.
Glioma diagnosis is based on histomorphology and grading; however, such classification does not have predictive clinical outcome after glioblastomas have developed. To date, no bona fide biomarkers ...that significantly translate into a survival benefit to glioblastoma patients have been identified. We previously reported that the IDH mutant G-CIMP-high subtype would be a predecessor to the G-CIMP-low subtype. Here, we performed a comprehensive DNA methylation longitudinal analysis of diffuse gliomas from 77 patients (200 tumors) to enlighten the epigenome-based malignant transformation of initially lower-grade gliomas. Intra-subtype heterogeneity among G-CIMP-high primary tumors allowed us to identify predictive biomarkers for assessing the risk of malignant recurrence at early stages of disease. G-CIMP-low recurrence appeared in 9.5% of all gliomas, and these resembled IDH-wild-type primary glioblastoma. G-CIMP-low recurrence can be characterized by distinct epigenetic changes at candidate functional tissue enhancers with AP-1/SOX binding elements, mesenchymal stem cell-like epigenomic phenotype, and genomic instability. Molecular abnormalities of longitudinal G-CIMP offer possibilities to defy glioblastoma progression.
Display omitted
•Intra-subtype heterogeneity of initially G-CIMP-high carries worst prognosis•G-CIMP-low is defined by DNA signature motifs for STAT3 and c-JUN/AP-1 at recurrence•G-CIMP-low at recurrence mimics an IDH-wild-type and stem cell-like primary GBM•Predictive biomarkers of glioma malignant transformation and recurrence are observed at diagnosis
IDH-mutant lower-grade glioma glioblastoma often progresses to a more aggressive phenotype upon recurrence. de Souza et al. examines the intra-subtype heterogeneity of initial G-CIMP-high and use this information to identify predictive biomarkers for assessing the risk of recurrence and malignant transformation.