A common issue affecting DNA methylation analysis in tumor tissue is the presence of a substantial amount of non-tumor methylation signal derived from the surrounding microenvironment. Although ...approaches for quantifying and correcting for the infiltration component have been proposed previously, we believe these have not fully addressed the issue in a comprehensive and universally applicable way. We present a multi-population framework for adjusting DNA methylation beta values on the Illumina 450/850K platform using generic purity estimates to account for non-tumor signal. Our approach also provides an indirect estimate of the aggregate methylation state of the surrounding normal tissue. Using whole exome sequencing derived purity estimates and Illumina 450K methylation array data generated by The Cancer Genome Atlas project (TCGA), we provide a demonstration of this framework in breast cancer illustrating the effect of beta correction on the aggregate methylation beta value distribution, clustering accuracy, and global methylation profiles.
Microarray-based gene expression analysis holds promise of improving prognostication and treatment decisions for breast cancer patients. However, the heterogeneity of breast cancer emphasizes the ...need for validation of prognostic gene signatures in larger sample sets stratified into relevant subgroups. Here, we describe a multifunctional user-friendly online tool, GOBO (http://co.bmc.lu.se/gobo), allowing a range of different analyses to be performed in an 1881-sample breast tumor data set, and a 51-sample breast cancer cell line set, both generated on Affymetrix U133A microarrays. GOBO supports a wide range of applications including: 1) rapid assessment of gene expression levels in subgroups of breast tumors and cell lines, 2) identification of co-expressed genes for creation of potential metagenes, 3) association with outcome for gene expression levels of single genes, sets of genes, or gene signatures in multiple subgroups of the 1881-sample breast cancer data set. The design and implementation of GOBO facilitate easy incorporation of additional query functions and applications, as well as additional data sets irrespective of tumor type and array platform.
Kataegis is a hypermutation phenomenon characterized by localized clusters of single base pair substitution (SBS) reported in multiple cancer types. Despite a high frequency in breast cancer, ...large-scale analyses of kataegis patterns and associations with clinicopathological and molecular variables in established breast cancer subgroups are lacking. Therefore, WGS profiled primary breast cancers (n = 791) with associated clinical and molecular data layers, like RNA-sequencing data, were analyzed for kataegis frequency, recurrence, and associations with genomic contexts and functional elements, transcriptional patterns, driver alterations, homologous recombination deficiency (HRD), and prognosis in tumor subgroups defined by ER, PR, and HER2/ERBB2 status. Kataegis frequency was highest in the HER2-positive(p) subgroups, including both ER-negative(n)/positive(p) tumors (ERnHER2p/ERpHER2p). In TNBC, kataegis was neither associated with PAM50 nor TNBC mRNA subtypes nor with distant relapse in chemotherapy-treated patients. In ERpHER2n tumors, kataegis was associated with aggressive characteristics, including PR-negativity, molecular Luminal B subtype, higher mutational burden, higher grade, and expression of proliferation-associated genes. Recurrent kataegis loci frequently targeted regions commonly amplified in ER-positive tumors, while few recurrent loci were observed in TNBC. SBSs in kataegis loci appeared enriched in regions of open chromatin. Kataegis status was not associated with HRD in any subgroup or with distinct transcriptional patterns in unsupervised or supervised analysis. In summary, kataegis is a common hypermutation phenomenon in established breast cancer subgroups, particularly in HER2p subgroups, coinciding with an aggressive tumor phenotype in ERpHER2n disease. In TNBC, the molecular implications and associations of kataegis are less clear, including its prognostic value.
Global loss of DNA methylation and CpG island (CGI) hypermethylation are key epigenomic aberrations in cancer. Global loss manifests itself in partially methylated domains (PMDs) which extend up to ...megabases. However, the distribution of PMDs within and between tumor types, and their effects on key functional genomic elements including CGIs are poorly defined. We comprehensively show that loss of methylation in PMDs occurs in a large fraction of the genome and represents the prime source of DNA methylation variation. PMDs are hypervariable in methylation level, size and distribution, and display elevated mutation rates. They impose intermediate DNA methylation levels incognizant of functional genomic elements including CGIs, underpinning a CGI methylator phenotype (CIMP). Repression effects on tumor suppressor genes are negligible as they are generally excluded from PMDs. The genomic distribution of PMDs reports tissue-of-origin and may represent tissue-specific silent regions which tolerate instability at the epigenetic, transcriptomic and genetic level.
Advances in high-throughput technologies encourage the generation of large amounts of multiomics data to investigate complex diseases, including breast cancer. Given that the aetiologies of such ...diseases extend beyond a single biological entity, and that essential biological information can be carried by all data regardless of data type, integrative analyses are needed to identify clinically relevant patterns. To facilitate such analyses, we present a permutation-based framework for random forest methods which simultaneously allows the unbiased integration of mixed-type data and assessment of relative feature importance. Through simulation studies and machine learning datasets, the performance of the approach was evaluated. The results showed minimal multicollinearity and limited overfitting. To further assess the performance, the permutation-based framework was applied to high-dimensional mixed-type data from two independent breast cancer cohorts. Reproducibility and robustness of our approach was demonstrated by the concordance in relative feature importance between the cohorts, along with consistencies in clustering profiles. One of the identified clusters was shown to be prognostic for clinical outcome after standard-of-care adjuvant chemotherapy and outperformed current intrinsic molecular breast cancer classifications.
Breast cancer in young adults has been implicated with a worse outcome. Analyses of genomic traits associated with age have been heterogenous, likely because of an incomplete accounting for ...underlying molecular subtypes. We aimed to resolve whether triple-negative breast cancer (TNBC) in younger versus older patients represent similar or different molecular diseases in the context of genetic and transcriptional subtypes and immune cell infiltration.
In total, 237 patients from a reported population-based south Swedish TNBC cohort profiled by RNA sequencing and whole-genome sequencing (WGS) were included. Patients were binned in 10-year intervals. Complimentary PD-L1 and CD20 immunohistochemistry and estimation of tumor-infiltrating lymphocytes (TILs) were performed. Cases were analyzed for differences in patient outcome, genomic, transcriptional, and immune landscape features versus age at diagnosis. Additionally, 560 public WGS breast cancer profiles were used for validation.
Median age at diagnosis was 62 years (range 26-91). Age was not associated with invasive disease-free survival or overall survival after adjuvant chemotherapy. Among the BRCA1-deficient cases (82/237), 90% were diagnosed before the age of 70 and were predominantly of the basal-like subtype. In the full TNBC cohort, reported associations of patient age with changes in Ki67 expression, PIK3CA mutations, and a luminal androgen receptor subtype were confirmed. Within DNA repair deficiency or gene expression defined molecular subgroups, age-related alterations in, e.g., overall gene expression, immune cell marker gene expression, genetic mutational and rearrangement signatures, amount of copy number alterations, and tumor mutational burden did, however, not appear distinct. Similar non-significant associations for genetic alterations with age were obtained for other breast cancer subgroups in public WGS data. Consistent with age-related immunosenescence, TIL counts decreased linearly with patient age across different genetic TNBC subtypes.
Age-related alterations in TNBC, as well as breast cancer in general, need to be viewed in the context of underlying genomic phenotypes. Based on this notion, age at diagnosis alone does not appear to provide an additional layer of biological complexity above that of proposed genetic and transcriptional phenotypes of TNBC. Consequently, treatment decisions should be less influenced by age and more driven by tumor biology.
Genomic rearrangements in cancer cells can create fusion genes that encode chimeric proteins or alter the expression of coding and non-coding RNAs. In some cancer types, fusions involving specific ...kinases are used as targets for therapy. Fusion genes can be detected by whole genome sequencing (WGS) and targeted fusion panels, but RNA sequencing (RNA-Seq) has the advantageous capability of broadly detecting expressed fusion transcripts.
We developed a pipeline for validation of fusion transcripts identified in RNA-Seq data using matched WGS data from The Cancer Genome Atlas (TCGA) and applied it to 910 tumors from 11 different cancer types. This resulted in 4237 validated gene fusions, 3049 of them with at least one identified genomic breakpoint. Utilizing validated fusions as true positive events, we trained a machine learning classifier to predict true and false positive fusion transcripts from RNA-Seq data. The final precision and recall metrics of the classifier were 0.74 and 0.71, respectively, in an independent dataset of 249 breast tumors. Application of this classifier to all samples with RNA-Seq data from these cancer types vastly extended the number of likely true positive fusion transcripts and identified many potentially targetable kinase fusions. Further analysis of the validated gene fusions suggested that many are created by intrachromosomal amplification events with microhomology-mediated non-homologous end-joining.
A classifier trained on validated fusion events increased the accuracy of fusion transcript identification in samples without WGS data. This allowed the analysis to be extended to all samples with RNA-Seq data, facilitating studies of tumor biology and increasing the number of detected kinase fusions. Machine learning could thus be used in identification of clinically relevant fusion events for targeted therapy. The large dataset of validated gene fusions generated here presents a useful resource for development and evaluation of fusion transcript detection algorithms.
Abstract
Background
Immunohistochemical (IHC) PD-L1 expression is commonly employed as predictive biomarker for checkpoint inhibitors in triple-negative breast cancer (TNBC). However, IHC evaluation ...methods are non-uniform and further studies are needed to optimize clinical utility.
Methods
We compared the concordance, prognostic value and gene expression between PD-L1 IHC expression by SP142 immune cell (IC) score and 22C3 combined positive score (CPS; companion IHC diagnostic assays for atezolizumab and pembrolizumab, respectively) in a population-based cohort of 232 early-stage TNBC patients.
Results
The expression rates of PD-L1 for SP142 IC ≥ 1%, 22C3 CPS ≥ 10, 22C3 CPS ≥ 1 and 22C3 IC ≥ 1% were 50.9%, 27.2%, 53.9% and 41.8%, respectively. The analytical concordance (kappa values) between SP142 IC+ and these three different 22C3 scorings were 73.7% (0.48, weak agreement), 81.5% (0.63) and 86.6% (0.73), respectively. The SP142 assay was better at identifying 22C3 positive tumors than the 22C3 assay was at detecting SP142 positive tumors. PD-L1 (
CD274
) gene expression (mRNA) showed a strong positive association with all two-categorical IHC scorings of the PD-L1 expression, irrespective of antibody and cut-off (Spearman Rho ranged from 0.59 to 0.62; all
p
-values < 0.001). PD-L1 IHC positivity and abundance of tumor infiltrating lymphocytes were of positive prognostic value in univariable regression analyses in patients treated with (neo)adjuvant chemotherapy, where it was strongest for 22C3 CPS ≥ 10 and distant relapse-free interval (HR = 0.18,
p
= 0.019). However, PD-L1 status was not independently prognostic when adjusting for abundance of tumor infiltrating lymphocytes in multivariable analyses.
Conclusion
Our findings support that the SP142 and 22C3 IHC assays, with their respective clinically applied scoring algorithms, are not analytically equivalent where they identify partially non-overlapping subpopulations of TNBC patients and cannot be substituted with one another regarding PD-L1 detection.
Trial registration
The Swedish Cancerome Analysis Network - Breast (SCAN-B) study, retrospectively registered 2nd Dec 2014 at ClinicalTrials.gov; ID NCT02306096.
To comprehensively characterize microRNA (miRNA) expression in breast cancer, we performed the first extensive next-generation sequencing expression analysis of this disease. We sequenced small RNA ...from tumors with paired samples of normal and tumor-adjacent breast tissue. Our results indicate that tumor identity is achieved mainly by variation in the expression levels of a common set of miRNAs rather than by tissue-specific expression. We also report 361 new, well-supported miRNA precursors. Nearly two-thirds of these new genes were detected in other human tissues and 49% of the miRNAs were found associated with Ago2 in MCF7 cells. Ten percent of the new miRNAs are located in regions with high-level genomic amplifications in breast cancer. A new miRNA is encoded within the ERBB2/Her2 gene and amplification of this gene leads to overexpression of the new miRNA, indicating that this potent oncogene and important clinical marker may have two different biological functions. In summary, our work substantially expands the number of known miRNAs and highlights the complexity of small RNA expression in breast cancer.
Similar to other malignancies, urothelial carcinoma (UC) is characterized by specific recurrent chromosomal aberrations and gene mutations. However, the interconnection between specific genomic ...alterations, and how patterns of chromosomal alterations adhere to different molecular subgroups of UC, is less clear. We applied tiling resolution array CGH to 146 cases of UC and identified a number of regions harboring recurrent focal genomic amplifications and deletions. Several potential oncogenes were included in the amplified regions, including known oncogenes like E2F3, CCND1, and CCNE1, as well as new candidate genes, such as SETDB1 (1q21), and BCL2L1 (20q11). We next combined genome profiling with global gene expression, gene mutation, and protein expression data and identified two major genomic circuits operating in urothelial carcinoma. The first circuit was characterized by FGFR3 alterations, overexpression of CCND1, and 9q and CDKN2A deletions. The second circuit was defined by E3F3 amplifications and RB1 deletions, as well as gains of 5p, deletions at PTEN and 2q36, 16q, 20q, and elevated CDKN2A levels. TP53/MDM2 alterations were common for advanced tumors within the two circuits. Our data also suggest a possible RAS/RAF circuit. The tumors with worst prognosis showed a gene expression profile that indicated a keratinized phenotype. Taken together, our integrative approach revealed at least two separate networks of genomic alterations linked to the molecular diversity seen in UC, and that these circuits may reflect distinct pathways of tumor development.