Detection of somatic point substitutions is a key step in characterizing the cancer genome. However, existing methods typically miss low-allelic-fraction mutations that occur in only a subset of the ...sequenced cells owing to either tumor heterogeneity or contamination by normal cells. Here we present MuTect, a method that applies a Bayesian classifier to detect somatic mutations with very low allele fractions, requiring only a few supporting reads, followed by carefully tuned filters that ensure high specificity. We also describe benchmarking approaches that use real, rather than simulated, sequencing data to evaluate the sensitivity and specificity as a function of sequencing depth, base quality and allelic fraction. Compared with other methods, MuTect has higher sensitivity with similar specificity, especially for mutations with allelic fractions as low as 0.1 and below, making MuTect particularly useful for studying cancer subclones and their evolution in standard exome and genome sequencing data.
Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets ...generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Detection of somatic mutations in human leukocyte antigen (HLA) genes using whole-exome sequencing (WES) is hampered by the high polymorphism of the HLA loci, which prevents alignment of sequencing ...reads to the human reference genome. We describe a computational pipeline that enables accurate inference of germline alleles of class I HLA-A, B and C genes and subsequent detection of mutations in these genes using the inferred alleles as a reference. Analysis of WES data from 7,930 pairs of tumor and healthy tissue from the same patient revealed 298 nonsilent HLA mutations in tumors from 266 patients. These 298 mutations are enriched for likely functional mutations, including putative loss-of-function events. Recurrence of mutations suggested that these 'hotspot' sites were positively selected. Cancers with recurrent somatic HLA mutations were associated with upregulation of signatures of cytolytic activity characteristic of tumor infiltration by effector lymphocytes, supporting immune evasion by altered HLA function as a contributory mechanism in cancer.
Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and ...evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.
Here, we present ContEst, a tool for estimating the level of cross-individual contamination in next-generation sequencing data. We demonstrate the accuracy of ContEst across a range of contamination ...levels, sources and read depths using sequencing data mixed in silico at known concentrations. We applied our tool to published cancer sequencing datasets and report their estimated contamination levels.
ContEst is a GATK module, and distributed under a BSD style license at http://www.broadinstitute.org/cancer/cga/contest
kcibul@broadinstitute.org; gadgetz@broadinstitute.org
Supplementary data is available at Bioinformatics online.
We describe a computational method that infers tumor purity and malignant cell ploidy directly from analysis of somatic DNA alterations. The method, named ABSOLUTE, can detect subclonal heterogeneity ...and somatic homozygosity, and it can calculate statistical sensitivity for detection of specific aberrations. We used ABSOLUTE to analyze exome sequencing data from 214 ovarian carcinoma tumor-normal pairs. This analysis identified both pervasive subclonal somatic point-mutations and a small subset of predominantly clonal and homozygous mutations, which were overrepresented in the tumor suppressor genes TP53 and NF1 and in a candidate tumor suppressor gene CDK12. We also used ABSOLUTE to infer absolute allelic copy-number profiles from 3,155 diverse cancer specimens, revealing that genome-doubling events are common in human cancer, likely occur in cells that are already aneuploid, and influence pathways of tumor progression (for example, with recessive inactivation of NF1 being less common after genome doubling). ABSOLUTE will facilitate the design of clinical sequencing studies and studies of cancer genome evolution and intra-tumor heterogeneity.
CLL is a heterogeneous disease with a variable clinical course and response to therapy. New genetic lesions have been noted in subgroups of patients through whole-exome and whole-genome sequencing. ...An abnormality in RNA splicing has been found in 15% of patients.
Chronic lymphocytic leukemia is an incurable disease characterized by extensive clinical heterogeneity despite a common diagnostic immunophenotype (surface expression of CD19+, CD20+dim, CD5+, CD23+, and sIgMdim). Whereas the course of disease is indolent in some patients, it is steadily progressive in approximately half of patients, leading to substantial morbidity and mortality.
1
Our ability to predict a more aggressive disease course has improved with the use of tests for biologic markers (degree of somatic hypermutation in the variable region of the immunoglobulin heavy chain
IGHV
gene and expression of ZAP70) and the detection of cytogenetic abnormalities (deletions . . .
Clonal evolution is a key feature of cancer progression and relapse. We studied intratumoral heterogeneity in 149 chronic lymphocytic leukemia (CLL) cases by integrating whole-exome sequence and copy ...number to measure the fraction of cancer cells harboring each somatic mutation. We identified driver mutations as predominantly clonal (e.g., MYD88, trisomy 12, and del(13q)) or subclonal (e.g., SF3B1 and TP53), corresponding to earlier and later events in CLL evolution. We sampled leukemia cells from 18 patients at two time points. Ten of twelve CLL cases treated with chemotherapy (but only one of six without treatment) underwent clonal evolution, predominantly involving subclones with driver mutations (e.g., SF3B1 and TP53) that expanded over time. Furthermore, presence of a subclonal driver mutation was an independent risk factor for rapid disease progression. Our study thus uncovers patterns of clonal evolution in CLL, providing insights into its stepwise transformation, and links the presence of subclones with adverse clinical outcomes.
Display omitted
► Whole-exome analysis of clonal heterogeneity in 149 chronic lymphocytic leukemias ► Earlier and later mutations in the temporal evolution of CLL are identified ► Clonal evolution is commonly seen with treatment, typically in a branched pattern ► A subclonal driver in a pretreatment sample is associated with adverse outcome
The intratumoral heterogeneity in 149 chronic lymphocytic leukemia (CLL) cases was evaluated by whole-exome sequencing. The evolutionary patterns of distinct clones enabled a temporal ordering of mutations in CLL, revealed the association of clonal evolution with chemotherapy, and linked the presence of subclonal driver mutations with adverse clinical outcomes.
Most patients with BRAF(V600)-mutant metastatic melanoma develop resistance to selective RAF kinase inhibitors. The spectrum of clinical genetic resistance mechanisms to RAF inhibitors and options ...for salvage therapy are incompletely understood. We performed whole-exome sequencing on formalin-fixed, paraffin-embedded tumors from 45 patients with BRAF(V600)-mutant metastatic melanoma who received vemurafenib or dabrafenib monotherapy. Genetic alterations in known or putative RAF inhibitor resistance genes were observed in 23 of 45 patients (51%). Besides previously characterized alterations, we discovered a "long tail" of new mitogen-activated protein kinase (MAPK) pathway alterations (MAP2K2, MITF) that confer RAF inhibitor resistance. In three cases, multiple resistance gene alterations were observed within the same tumor biopsy. Overall, RAF inhibitor therapy leads to diverse clinical genetic resistance mechanisms, mostly involving MAPK pathway reactivation. Novel therapeutic combinations may be needed to achieve durable clinical control of BRAF(V600)-mutant melanoma. Integrating clinical genomics with preclinical screens may model subsequent resistance studies.
Translating whole-exome sequencing (WES) for prospective clinical use may have an impact on the care of patients with cancer; however, multiple innovations are necessary for clinical implementation. ...These include rapid and robust WES of DNA derived from formalin-fixed, paraffin-embedded tumor tissue, analytical output similar to data from frozen samples and clinical interpretation of WES data for prospective use. Here, we describe a prospective clinical WES platform for archival formalin-fixed, paraffin-embedded tumor samples. The platform employs computational methods for effective clinical analysis and interpretation of WES data. When applied retrospectively to 511 exomes, the interpretative framework revealed a 'long tail' of somatic alterations in clinically important genes. Prospective application of this approach identified clinically relevant alterations in 15 out of 16 patients. In one patient, previously undetected findings guided clinical trial enrollment, leading to an objective clinical response. Overall, this methodology may inform the widespread implementation of precision cancer medicine.