COSMIC, the Catalogue Of Somatic Mutations In Cancer (http://cancer.sanger.ac.uk) is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer. ...Our latest release (v70; Aug 2014) describes 2 002 811 coding point mutations in over one million tumor samples and across most human genes. To emphasize depth of knowledge on known cancer genes, mutation information is curated manually from the scientific literature, allowing very precise definitions of disease types and patient details. Combination of almost 20,000 published studies gives substantial resolution of how mutations and phenotypes relate in human cancer, providing insights into the stratification of mutations and biomarkers across cancer patient populations. Conversely, our curation of cancer genomes (over 12,000) emphasizes knowledge breadth, driving discovery of unrecognized cancer-driving hotspots and molecular targets. Our high-resolution curation approach is globally unique, giving substantial insight into molecular biomarkers in human oncology. In addition, COSMIC also details more than six million noncoding mutations, 10,534 gene fusions, 61,299 genome rearrangements, 695,504 abnormal copy number segments and 60,119,787 abnormal expression variants. All these types of somatic mutation are annotated to both the human genome and each affected coding gene, then correlated across disease and mutation types.
It is recognized that some mutated cancer genes contribute to the development of many cancer types, whereas others are cancer type specific. For genes that are mutated in multiple cancer classes, ...mutations are usually similar in the different affected cancer types. Here, however, we report exquisite tumor type specificity for different histone H3.3 driver alterations. In 73 of 77 cases of chondroblastoma (95%), we found p.Lys36Met alterations predominantly encoded in H3F3B, which is one of two genes for histone H3.3. In contrast, in 92% (49/53) of giant cell tumors of bone, we found histone H3.3 alterations exclusively in H3F3A, leading to p.Gly34Trp or, in one case, p.Gly34Leu alterations. The mutations were restricted to the stromal cell population and were not detected in osteoclasts or their precursors. In the context of previously reported H3F3A mutations encoding p.Lys27Met and p.Gly34Arg or p.Gly34Val alterations in childhood brain tumors, a remarkable picture of tumor type specificity for histone H3.3 driver alterations emerges, indicating that histone H3.3 residues, mutations and genes have distinct functions.
Myeloproliferative neoplasms, such as polycythemia vera, essential thrombocythemia, and myelofibrosis, are chronic hematologic cancers with varied progression rates. The genomic characterization of ...patients with myeloproliferative neoplasms offers the potential for personalized diagnosis, risk stratification, and treatment.
We sequenced coding exons from 69 myeloid cancer genes in patients with myeloproliferative neoplasms, comprehensively annotating driver mutations and copy-number changes. We developed a genomic classification for myeloproliferative neoplasms and multistage prognostic models for predicting outcomes in individual patients. Classification and prognostic models were validated in an external cohort.
A total of 2035 patients were included in the analysis. A total of 33 genes had driver mutations in at least 5 patients, with mutations in JAK2, CALR, or MPL being the sole abnormality in 45% of the patients. The numbers of driver mutations increased with age and advanced disease. Driver mutations, germline polymorphisms, and demographic variables independently predicted whether patients received a diagnosis of essential thrombocythemia as compared with polycythemia vera or a diagnosis of chronic-phase disease as compared with myelofibrosis. We defined eight genomic subgroups that showed distinct clinical phenotypes, including blood counts, risk of leukemic transformation, and event-free survival. Integrating 63 clinical and genomic variables, we created prognostic models capable of generating personally tailored predictions of clinical outcomes in patients with chronic-phase myeloproliferative neoplasms and myelofibrosis. The predicted and observed outcomes correlated well in internal cross-validation of a training cohort and in an independent external cohort. Even within individual categories of existing prognostic schemas, our models substantially improved predictive accuracy.
Comprehensive genomic characterization identified distinct genetic subgroups and provided a classification of myeloproliferative neoplasms on the basis of causal biologic mechanisms. Integration of genomic data with clinical variables enabled the personalized predictions of patients' outcomes and may support the treatment of patients with myeloproliferative neoplasms. (Funded by the Wellcome Trust and others.).
The catalogue of Somatic Mutations in Cancer (COSMIC) (http://www.sanger.ac.uk/cosmic/) is the largest public resource for information on somatically acquired mutations in human cancer and is ...available freely without restrictions. Currently (v43, August 2009), COSMIC contains details of 1.5-million experiments performed through 13 423 genes in almost 370 000 tumours, describing over 90 000 individual mutations. Data are gathered from two sources, publications in the scientific literature, (v43 contains 7797 curated articles) and the full output of the genome-wide screens from the Cancer Genome Project (CGP) at the Sanger Institute, UK. Most of the world's literature on point mutations in human cancer has now been curated into COSMIC and while this is continually updated, a greater emphasis on curating fusion gene mutations is driving the expansion of this information; over 2700 fusion gene mutations are now described. Whole-genome sequencing screens are now identifying large numbers of genomic rearrangements in cancer and COSMIC is now displaying details of these analyses also. Examination of COSMIC's data is primarily web-driven, focused on providing mutation range and frequency statistics based upon a choice of gene and/or cancer phenotype. Graphical views provide easily interpretable summaries of large quantities of data, and export functions can provide precise details of user-selected data.
COSMIC (http://www.sanger.ac.uk/cosmic) curates comprehensive information on somatic mutations in human cancer. Release v48 (July 2010) describes over 136,000 coding mutations in almost 542,000 ...tumour samples; of the 18,490 genes documented, 4803 (26%) have one or more mutations. Full scientific literature curations are available on 83 major cancer genes and 49 fusion gene pairs (19 new cancer genes and 30 new fusion pairs this year) and this number is continually increasing. Key amongst these is TP53, now available through a collaboration with the IARC p53 database. In addition to data from the Cancer Genome Project (CGP) at the Sanger Institute, UK, and The Cancer Genome Atlas project (TCGA), large systematic screens are also now curated. Major website upgrades now make these data much more mineable, with many new selection filters and graphics. A Biomart is now available allowing more automated data mining and integration with other biological databases. Annotation of genomic features has become a significant focus; COSMIC has begun curating full-genome resequencing experiments, developing new web pages, export formats and graphics styles. With all genomic information recently updated to GRCh37, COSMIC integrates many diverse types of mutation information and is making much closer links with Ensembl and other data resources.
Recent studies have provided a detailed census of genes that are mutated in acute myeloid leukemia (AML). Our next challenge is to understand how this genetic diversity defines the pathophysiology of ...AML and informs clinical practice.
We enrolled a total of 1540 patients in three prospective trials of intensive therapy. Combining driver mutations in 111 cancer genes with cytogenetic and clinical data, we defined AML genomic subgroups and their relevance to clinical outcomes.
We identified 5234 driver mutations across 76 genes or genomic regions, with 2 or more drivers identified in 86% of the patients. Patterns of co-mutation compartmentalized the cohort into 11 classes, each with distinct diagnostic features and clinical outcomes. In addition to currently defined AML subgroups, three heterogeneous genomic categories emerged: AML with mutations in genes encoding chromatin, RNA-splicing regulators, or both (in 18% of patients); AML with TP53 mutations, chromosomal aneuploidies, or both (in 13%); and, provisionally, AML with IDH2(R172) mutations (in 1%). Patients with chromatin-spliceosome and TP53-aneuploidy AML had poor outcomes, with the various class-defining mutations contributing independently and additively to the outcome. In addition to class-defining lesions, other co-occurring driver mutations also had a substantial effect on overall survival. The prognostic effects of individual mutations were often significantly altered by the presence or absence of other driver mutations. Such gene-gene interactions were especially pronounced for NPM1-mutated AML, in which patterns of co-mutation identified groups with a favorable or adverse prognosis. These predictions require validation in prospective clinical trials.
The driver landscape in AML reveals distinct molecular subgroups that reflect discrete paths in the evolution of AML, informing disease classification and prognostic stratification. (Funded by the Wellcome Trust and others; ClinicalTrials.gov number, NCT00146120.).
Myelodysplastic syndromes (MDS) are a heterogeneous group of chronic hematological malignancies characterized by dysplasia, ineffective hematopoiesis and a variable risk of progression to acute ...myeloid leukemia. Sequencing of MDS genomes has identified mutations in genes implicated in RNA splicing, DNA modification, chromatin regulation, and cell signaling. We sequenced 111 genes across 738 patients with MDS or closely related neoplasms (including chronic myelomonocytic leukemia and MDS–myeloproliferative neoplasms) to explore the role of acquired mutations in MDS biology and clinical phenotype. Seventy-eight percent of patients had 1 or more oncogenic mutations. We identify complex patterns of pairwise association between genes, indicative of epistatic interactions involving components of the spliceosome machinery and epigenetic modifiers. Coupled with inferences on subclonal mutations, these data suggest a hypothesis of genetic “predestination,” in which early driver mutations, typically affecting genes involved in RNA splicing, dictate future trajectories of disease evolution with distinct clinical phenotypes. Driver mutations had equivalent prognostic significance, whether clonal or subclonal, and leukemia-free survival deteriorated steadily as numbers of driver mutations increased. Thus, analysis of oncogenic mutations in large, well-characterized cohorts of patients illustrates the interconnections between the cancer genome and disease biology, with considerable potential for clinical application.
Key Points
Multiple signatures of somatic mutations have been identified in cancer genomes. Exome sequences of 1,001 human cancer cell lines and 577 xenografts revealed most common mutational signatures, ...indicating past activity of the underlying processes, usually in appropriate cancer types. To investigate ongoing patterns of mutational-signature generation, cell lines were cultured for extended periods and subsequently DNA sequenced. Signatures of discontinued exposures, including tobacco smoke and ultraviolet light, were not generated in vitro. Signatures of normal and defective DNA repair and replication continued to be generated at roughly stable mutation rates. Signatures of APOBEC cytidine deaminase DNA-editing exhibited substantial fluctuations in mutation rate over time with episodic bursts of mutations. The initiating factors for the bursts are unclear, although retrotransposon mobilization may contribute. The examined cell lines constitute a resource of live experimental models of mutational processes, which potentially retain patterns of activity and regulation operative in primary human cancers.
Display omitted
•Annotation of mutational signatures across 1,001 cancer cell lines and 577 PDXs•Activities of mutational processes determined over time in cancer cell lines•APOBEC-associated mutagenesis is often ongoing and can be episodic•Detection of mutational signatures by single-cell sequencing
An analysis of 1,001 human cancer cell lines and 577 xenografts shows that mutagenesis associated with the cytodine deaminase APOBEC occurs in episodic bursts in contrast to mutation signatures associated with DNA replication and repair.
The ETV6-RUNX1 fusion gene, found in 25% of childhood acute lymphoblastic leukemia (ALL) cases, is acquired in utero but requires additional somatic mutations for overt leukemia. We used exome and ...low-coverage whole-genome sequencing to characterize secondary events associated with leukemic transformation. RAG-mediated deletions emerge as the dominant mutational process, characterized by recombination signal sequence motifs near breakpoints, incorporation of non-templated sequence at junctions, ∼30-fold enrichment at promoters and enhancers of genes actively transcribed in B cell development and an unexpectedly high ratio of recurrent to non-recurrent structural variants. Single-cell tracking shows that this mechanism is active throughout leukemic evolution, with evidence of localized clustering and reiterated deletions. Integration of data on point mutations and rearrangements identifies ATF7IP and MGA as two new tumor-suppressor genes in ALL. Thus, a remarkably parsimonious mutational process transforms ETV6-RUNX1-positive lymphoblasts, targeting the promoters, enhancers and first exons of genes that normally regulate B cell differentiation.
As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample ...pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼ 100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy.