Microsatellite instability (MSI) is an important indicator of larger genome instability and has been linked to many genetic diseases, including Lynch syndrome. MSI status is also an independent ...prognostic factor for favorable survival in multiple cancer types, such as colorectal and endometrial. It also informs the choice of chemotherapeutic agents. However, the current PCR-electrophoresis-based detection procedure is laborious and time-consuming, often requiring visual inspection to categorize samples. We developed MSIsensor, a C++ program for automatically detecting somatic microsatellite changes. It computes length distributions of microsatellites per site in paired tumor and normal sequence data, subsequently using these to statistically compare observed distributions in both samples. Comprehensive testing indicates MSIsensor is an efficient and effective tool for deriving MSI status from standard tumor-normal paired sequence data.
https://github.com/ding-lab/msisensor
The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point ...mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate the distributions of mutation frequencies, types and contexts across tumour types, and establish their links to tissues of origin, environmental/carcinogen influences, and DNA repair defects. Using the integrated data sets, we identified 127 significantly mutated genes from well-known (for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase, Wnt/β-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control) and emerging (for example, histone, histone modification, splicing, metabolism and proteolysis) cellular processes in cancer. The average number of mutations in these significantly mutated genes varies across tumour types; most tumours have two to six, indicating that the number of driver mutations required during oncogenesis is relatively small. Mutations in transcriptional factors/regulators show tissue specificity, whereas histone modifiers are often mutated across several cancer types. Clinical association analysis identifies genes having a significant effect on survival, and investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis. Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment.
Massively parallel sequencing technology and the associated rapidly decreasing sequencing costs have enabled systemic analyses of somatic mutations in large cohorts of cancer cases. Here we introduce ...a comprehensive mutational analysis pipeline that uses standardized sequence-based inputs along with multiple types of clinical data to establish correlations among mutation sites, affected genes and pathways, and to ultimately separate the commonly abundant passenger mutations from the truly significant events. In other words, we aim to determine the Mutational Significance in Cancer (MuSiC) for these large data sets. The integration of analytical operations in the MuSiC framework is widely applicable to a broad set of tumor types and offers the benefits of automation as well as standardization. Herein, we describe the computational structure and statistical underpinnings of the MuSiC pipeline and demonstrate its performance using 316 ovarian cancer samples from the TCGA ovarian cancer project. MuSiC correctly confirms many expected results, and identifies several potentially novel avenues for discovery.
High-throughput DNA sequencing has revolutionized the study of cancer genomics with numerous discoveries that are relevant to cancer diagnosis and treatment. The latest sequencing and analysis ...methods have successfully identified somatic alterations, including single-nucleotide variants, insertions and deletions, copy-number aberrations, structural variants and gene fusions. Additional computational techniques have proved useful for defining the mutations, genes and molecular networks that drive diverse cancer phenotypes and that determine clonal architectures in tumour samples. Collectively, these tools have advanced the study of genomic, transcriptomic and epigenomic alterations in cancer, and their association to clinical properties. Here, we review cancer genomics software and the insights that have been gained from their application.
Local concentrations of mutations are well known in human cancers. However, their three-dimensional spatial relationships in the encoded protein have yet to be systematically explored. We developed a ...computational tool, HotSpot3D, to identify such spatial hotspots (clusters) and to interpret the potential function of variants within them. We applied HotSpot3D to >4,400 TCGA tumors across 19 cancer types, discovering >6,000 intra- and intermolecular clusters, some of which showed tumor and/or tissue specificity. In addition, we identified 369 rare mutations in genes including TP53, PTEN, VHL, EGFR, and FBXW7 and 99 medium-recurrence mutations in genes such as RUNX1, MTOR, CA3, PI3, and PTPN11, all mapping within clusters having potential functional implications. As a proof of concept, we validated our predictions in EGFR using high-throughput phosphorylation data and cell-line-based experimental evaluation. Finally, mutation-drug cluster and network analysis predicted over 800 promising candidates for druggable mutations, raising new possibilities for designing personalized treatments for patients carrying specific mutations.
Detection and characterization of genomic structural variation are important for understanding the landscape of genetic variation in human populations and in complex diseases such as cancer. Recent ...studies demonstrate the feasibility of detecting structural variation using next-generation, short-insert, paired-end sequencing reads. However, the utility of these reads is not entirely clear, nor are the analysis methods with which accurate detection can be achieved. The algorithm BreakDancer predicts a wide variety of structural variants including insertion-deletions (indels), inversions and translocations. We examined BreakDancer's performance in simulation, in comparison with other methods and in analyses of a sample from an individual with acute myeloid leukemia and of samples from the 1,000 Genomes trio individuals. BreakDancer sensitively and accurately detected indels ranging from 10 base pairs to 1 megabase pair that are difficult to detect via a single conventional approach.
Several genetic alterations characteristic of leukemia and lymphoma have been detected in the blood of individuals without apparent hematological malignancies. The Cancer Genome Atlas (TCGA) provides ...a unique resource for comprehensive discovery of mutations and genes in blood that may contribute to the clonal expansion of hematopoietic stem/progenitor cells. Here, we analyzed blood-derived sequence data from 2,728 individuals from TCGA and discovered 77 blood-specific mutations in cancer-associated genes, the majority being associated with advanced age. Remarkably, 83% of these mutations were from 19 leukemia and/or lymphoma-associated genes, and nine were recurrently mutated (DNMT3A, TET2, JAK2, ASXL1, TP53, GNAS, PPM1D, BCORL1 and SF3B1). We identified 14 additional mutations in a very small fraction of blood cells, possibly representing the earliest stages of clonal expansion in hematopoietic stem cells. Comparison of these findings to mutations in hematological malignancies identified several recurrently mutated genes that may be disease initiators. Our analyses show that the blood cells of more than 2% of individuals (5-6% of people older than 70 years) contain mutations that may represent premalignant events that cause clonal hematopoietic expansion.
To assess the genetic consequences of induced pluripotent stem cell (iPSC) reprogramming, we sequenced the genomes of ten murine iPSC clones derived from three independent reprogramming experiments, ...and compared them to their parental cell genomes. We detected hundreds of single nucleotide variants (SNVs) in every clone, with an average of 11 in coding regions. In two experiments, all SNVs were unique for each clone and did not cluster in pathways, but in the third, all four iPSC clones contained 157 shared genetic variants, which could also be detected in rare cells (<1 in 500) within the parental MEF pool. These data suggest that most of the genetic variation in iPSC clones is not caused by reprogramming per se, but is rather a consequence of cloning individual cells, which “captures” their mutational history. These findings have implications for the development and therapeutic use of cells that are reprogrammed by any method.
► iPSC clones contain hundreds of SNVs that are unique to each clone ► Most iPSC genomes do not contain recurrently mutated genes or pathways ► Reprogramming can select for rare cells with shared genetic variants ► Most SNVs are probably preexisting mutations “captured” by cloning
Complex insertions and deletions (indels) are formed by simultaneously deleting and inserting DNA fragments of different sizes at a common genomic location. Here we present a systematic analysis of ...somatic complex indels in the coding sequences of samples from over 8,000 cancer cases using Pindel-C. We discovered 285 complex indels in cancer-associated genes (such as PIK3R1, TP53, ARID1A, GATA3 and KMT2D) in approximately 3.5% of cases analyzed; nearly all instances of complex indels were overlooked (81.1%) or misannotated (17.6%) in previous reports of 2,199 samples. In-frame complex indels are enriched in PIK3R1 and EGFR, whereas frameshifts are prevalent in VHL, GATA3, TP53, ARID1A, PTEN and ATRX. Furthermore, complex indels display strong tissue specificity (such as VHL in kidney cancer samples and GATA3 in breast cancer samples). Finally, structural analyses support findings of previously missed, but potentially druggable, mutations in the EGFR, MET and KIT oncogenes. This study indicates the critical importance of improving complex indel discovery and interpretation in medical research.
Cancer immunoediting, the process by which the immune system controls tumour outgrowth and shapes tumour immunogenicity, is comprised of three phases: elimination, equilibrium and escape. Although ...many immune components that participate in this process are known, its underlying mechanisms remain poorly defined. A central tenet of cancer immunoediting is that T-cell recognition of tumour antigens drives the immunological destruction or sculpting of a developing cancer. However, our current understanding of tumour antigens comes largely from analyses of cancers that develop in immunocompetent hosts and thus may have already been edited. Little is known about the antigens expressed in nascent tumour cells, whether they are sufficient to induce protective antitumour immune responses or whether their expression is modulated by the immune system. Here, using massively parallel sequencing, we characterize expressed mutations in highly immunogenic methylcholanthrene-induced sarcomas derived from immunodeficient Rag2(-/-) mice that phenotypically resemble nascent primary tumour cells. Using class I prediction algorithms, we identify mutant spectrin-β2 as a potential rejection antigen of the d42m1 sarcoma and validate this prediction by conventional antigen expression cloning and detection. We also demonstrate that cancer immunoediting of d42m1 occurs via a T-cell-dependent immunoselection process that promotes outgrowth of pre-existing tumour cell clones lacking highly antigenic mutant spectrin-β2 and other potential strong antigens. These results demonstrate that the strong immunogenicity of an unedited tumour can be ascribed to expression of highly antigenic mutant proteins and show that outgrowth of tumour cells that lack these strong antigens via a T-cell-dependent immunoselection process represents one mechanism of cancer immunoediting.