How somatic mutations accumulate in normal cells is poorly understood. A comprehensive analysis of RNA sequencing data from ~6700 samples across 29 normal tissues revealed multiple somatic variants, ...demonstrating that macroscopic clones can be found in many normal tissues. We found that sun-exposed skin, esophagus, and lung have a higher mutation burden than other tested tissues, which suggests that environmental factors can promote somatic mosaicism. Mutation burden was associated with both age and tissue-specific cell proliferation rate, highlighting that mutations accumulate over both time and number of cell divisions. Finally, normal tissues were found to harbor mutations in known cancer genes and hotspots. This study provides a broad view of macroscopic clonal expansion in human tissues, thus serving as a foundation for associating clonal expansion with environmental factors, aging, and risk of disease.
Structural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a ...powerful approach to identifying SVs, but is difficult to apply at scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA's performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs and substantially improves detection performance for variants in the 20-300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (<1000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types and found that short templated-sequence insertions occur in ∼4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized (50-300 bp) SVs.
Retrotransposons constitute a major source of genetic variation, and somatic retrotransposon insertions have been reported in cancer. Here, we applied TranspoSeq, a computational framework that ...identifies retrotransposon insertions from sequencing data, to whole genomes from 200 tumor/normal pairs across 11 tumor types as part of The Cancer Genome Atlas (TCGA) Pan-Cancer Project. In addition to novel germline polymorphisms, we find 810 somatic retrotransposon insertions primarily in lung squamous, head and neck, colorectal, and endometrial carcinomas. Many somatic retrotransposon insertions occur in known cancer genes. We find that high somatic retrotransposition rates in tumors are associated with high rates of genomic rearrangement and somatic mutation. Finally, we developed TranspoSeq-Exome to interrogate an additional 767 tumor samples with hybrid-capture exome data and discovered 35 novel somatic retrotransposon insertions into exonic regions, including an insertion into an exon of the PTEN tumor suppressor gene. The results of this large-scale, comprehensive analysis of retrotransposon movement across tumor types suggest that somatic retrotransposon insertions may represent an important class of structural variation in cancer.
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads ...generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).
Genomic analysis of tumours has led to the identification of hundreds of cancer genes on the basis of the presence of mutations in protein-coding regions. By contrast, much less is known about ...cancer-causing mutations in non-coding regions. Here we perform deep sequencing in 360 primary breast cancers and develop computational methods to identify significantly mutated promoters. Clear signals are found in the promoters of three genes. FOXA1, a known driver of hormone-receptor positive breast cancer, harbours a mutational hotspot in its promoter leading to overexpression through increased E2F binding. RMRP and NEAT1, two non-coding RNA genes, carry mutations that affect protein binding to their promoters and alter expression levels. Our study shows that promoter regions harbour recurrent mutations in cancer with functional consequences and that the mutations occur at similar frequencies as in coding regions. Power analyses indicate that more such regions remain to be discovered through deep sequencing of adequately sized cohorts of patients.
Resistance to the Bruton's tyrosine kinase (BTK) inhibitor ibrutinib has been attributed solely to mutations in BTK and related pathway molecules. Using whole-exome and deep-targeted sequencing, we ...dissect evolution of ibrutinib resistance in serial samples from five chronic lymphocytic leukaemia patients. In two patients, we detect BTK-C481S mutation or multiple PLCG2 mutations. The other three patients exhibit an expansion of clones harbouring del(8p) with additional driver mutations (EP300, MLL2 and EIF2A), with one patient developing trans-differentiation into CD19-negative histiocytic sarcoma. Using droplet-microfluidic technology and growth kinetic analyses, we demonstrate the presence of ibrutinib-resistant subclones and estimate subclone size before treatment initiation. Haploinsufficiency of TRAIL-R, a consequence of del(8p), results in TRAIL insensitivity, which may contribute to ibrutinib resistance. These findings demonstrate that the ibrutinib therapy favours selection and expansion of rare subclones already present before ibrutinib treatment, and provide insight into the heterogeneity of genetic changes associated with ibrutinib resistance.
Primary central nervous system lymphomas (PCNSLs) and primary testicular lymphomas (PTLs) are extranodal large B-cell lymphomas (LBCLs) with inferior responses to current empiric treatment regimens. ...To identify targetable genetic features of PCNSL and PTL, we characterized their recurrent somatic mutations, chromosomal rearrangements, copy number alterations (CNAs), and associated driver genes, and compared these comprehensive genetic signatures to those of diffuse LBCL and primary mediastinal large B-cell lymphoma (PMBL). These studies identify unique combinations of genetic alterations in discrete LBCL subtypes and subtype-selective bases for targeted therapy. PCNSLs and PTLs frequently exhibit genomic instability, and near-uniform, often biallelic, CDKN2A loss with rare TP53 mutations. PCNSLs and PTLs also use multiple genetic mechanisms to target key genes and pathways and exhibit near-uniform oncogenic Toll-like receptor signaling as a result of MYD88 mutation and/or NFKBIZ amplification, frequent concurrent B-cell receptor pathway activation, and deregulation of BCL6. Of great interest, PCNSLs and PTLs also have frequent 9p24.1/PD-L1/PD-L2 CNAs and additional translocations of these loci, structural bases of immune evasion that are shared with PMBL.
•PCNSLs and PTLs have a defining genetic signature that differs from other LBCLs and suggests rational targeted therapies.•PCNSLs and PTLs frequently exhibit 9p24.1/PD-L1/PD-L2 copy number alterations and translocations, likely genetic bases of immune evasion.
Barrett's esophagus is thought to progress to esophageal adenocarcinoma (EAC) through a stepwise progression with loss of CDKN2A followed by TP53 inactivation and aneuploidy. Here we present ...whole-exome sequencing from 25 pairs of EAC and Barrett's esophagus and from 5 patients whose Barrett's esophagus and tumor were extensively sampled. Our analysis showed that oncogene amplification typically occurred as a late event and that TP53 mutations often occurred early in Barrett's esophagus progression, including in non-dysplastic epithelium. Reanalysis of additional EAC exome data showed that the majority (62.5%) of EACs emerged following genome doubling and that tumors with genomic doubling had different patterns of genomic alterations, with more frequent oncogenic amplification and less frequent inactivation of tumor suppressors, including CDKN2A. These data suggest that many EACs emerge not through the gradual accumulation of tumor-suppressor alterations but rather through a more direct path whereby a TP53-mutant cell undergoes genome doubling, followed by the acquisition of oncogenic amplifications.
Clonal evolution is a key feature of cancer progression and relapse. We studied intratumoral heterogeneity in 149 chronic lymphocytic leukemia (CLL) cases by integrating whole-exome sequence and copy ...number to measure the fraction of cancer cells harboring each somatic mutation. We identified driver mutations as predominantly clonal (e.g., MYD88, trisomy 12, and del(13q)) or subclonal (e.g., SF3B1 and TP53), corresponding to earlier and later events in CLL evolution. We sampled leukemia cells from 18 patients at two time points. Ten of twelve CLL cases treated with chemotherapy (but only one of six without treatment) underwent clonal evolution, predominantly involving subclones with driver mutations (e.g., SF3B1 and TP53) that expanded over time. Furthermore, presence of a subclonal driver mutation was an independent risk factor for rapid disease progression. Our study thus uncovers patterns of clonal evolution in CLL, providing insights into its stepwise transformation, and links the presence of subclones with adverse clinical outcomes.
Display omitted
► Whole-exome analysis of clonal heterogeneity in 149 chronic lymphocytic leukemias ► Earlier and later mutations in the temporal evolution of CLL are identified ► Clonal evolution is commonly seen with treatment, typically in a branched pattern ► A subclonal driver in a pretreatment sample is associated with adverse outcome
The intratumoral heterogeneity in 149 chronic lymphocytic leukemia (CLL) cases was evaluated by whole-exome sequencing. The evolutionary patterns of distinct clones enabled a temporal ordering of mutations in CLL, revealed the association of clonal evolution with chemotherapy, and linked the presence of subclonal driver mutations with adverse clinical outcomes.
Large-scale genomic characterization of tumors from prospective cohort studies may yield new insights into cancer pathogenesis. We performed whole-exome sequencing of 619 incident colorectal cancers ...(CRCs) and integrated the results with tumor immunity, pathology, and survival data. We identified recurrently mutated genes in CRC, such as BCL9L, RBM10, CTCF, and KLF5, that were not previously appreciated in this disease. Furthermore, we investigated the genomic correlates of immune-cell infiltration and found that higher neoantigen load was positively associated with overall lymphocytic infiltration, tumor-infiltrating lymphocytes (TILs), memory T cells, and CRC-specific survival. The association with TILs was evident even within microsatellite-stable tumors. We also found positive selection of mutations in HLA genes and other components of the antigen-processing machinery in TIL-rich tumors. These results may inform immunotherapeutic approaches in CRC. More generally, this study demonstrates a framework for future integrative molecular epidemiology research in colorectal and other malignancies.
Display omitted
•Whole-exome sequencing of 619 colorectal cancers with clinicopathologic annotations•Discovery of significantly mutated genes in colorectal cancer•Neoantigen load correlation with infiltrating lymphocytes and memory T cells•Positive selection for HLA mutations in immune-cell-infiltrated tumors
Through whole-exome sequencing of annotated colorectal tumors, Giannakis et al. identify additional colorectal cancer driver genes and correlate high neoantigen load with increased lymphocytic infiltration and improved survival. They also find positive selection for HLA mutations in immune-cell-infiltrated tumors. These results may inform immunotherapeutic approaches in colorectal cancer.