Microsatellite instability (MSI) is an important indicator of larger genome instability and has been linked to many genetic diseases, including Lynch syndrome. MSI status is also an independent ...prognostic factor for favorable survival in multiple cancer types, such as colorectal and endometrial. It also informs the choice of chemotherapeutic agents. However, the current PCR-electrophoresis-based detection procedure is laborious and time-consuming, often requiring visual inspection to categorize samples. We developed MSIsensor, a C++ program for automatically detecting somatic microsatellite changes. It computes length distributions of microsatellites per site in paired tumor and normal sequence data, subsequently using these to statistically compare observed distributions in both samples. Comprehensive testing indicates MSIsensor is an efficient and effective tool for deriving MSI status from standard tumor-normal paired sequence data.
https://github.com/ding-lab/msisensor
The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point ...mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate the distributions of mutation frequencies, types and contexts across tumour types, and establish their links to tissues of origin, environmental/carcinogen influences, and DNA repair defects. Using the integrated data sets, we identified 127 significantly mutated genes from well-known (for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase, Wnt/β-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control) and emerging (for example, histone, histone modification, splicing, metabolism and proteolysis) cellular processes in cancer. The average number of mutations in these significantly mutated genes varies across tumour types; most tumours have two to six, indicating that the number of driver mutations required during oncogenesis is relatively small. Mutations in transcriptional factors/regulators show tissue specificity, whereas histone modifiers are often mutated across several cancer types. Clinical association analysis identifies genes having a significant effect on survival, and investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis. Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, KISLJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
BackgroundTumor mutational burden (TMB), defined as the number of somatic mutations per megabase of interrogated genomic sequence, demonstrates predictive biomarker potential for the identification ...of patients with cancer most likely to respond to immune checkpoint inhibitors. TMB is optimally calculated by whole exome sequencing (WES), but next-generation sequencing targeted panels provide TMB estimates in a time-effective and cost-effective manner. However, differences in panel size and gene coverage, in addition to the underlying bioinformatics pipelines, are known drivers of variability in TMB estimates across laboratories. By directly comparing panel-based TMB estimates from participating laboratories, this study aims to characterize the theoretical variability of panel-based TMB estimates, and provides guidelines on TMB reporting, analytic validation requirements and reference standard alignment in order to maintain consistency of TMB estimation across platforms.MethodsEleven laboratories used WES data from The Cancer Genome Atlas Multi-Center Mutation calling in Multiple Cancers (MC3) samples and calculated TMB from the subset of the exome restricted to the genes covered by their targeted panel using their own bioinformatics pipeline (panel TMB). A reference TMB value was calculated from the entire exome using a uniform bioinformatics pipeline all members agreed on (WES TMB). Linear regression analyses were performed to investigate the relationship between WES and panel TMB for all 32 cancer types combined and separately. Variability in panel TMB values at various WES TMB values was also quantified using 95% prediction limits.ResultsStudy results demonstrated that variability within and between panel TMB values increases as the WES TMB values increase. For each panel, prediction limits based on linear regression analyses that modeled panel TMB as a function of WES TMB were calculated and found to approximately capture the intended 95% of observed panel TMB values. Certain cancer types, such as uterine, bladder and colon cancers exhibited greater variability in panel TMB values, compared with lung and head and neck cancers.ConclusionsIncreasing uptake of TMB as a predictive biomarker in the clinic creates an urgent need to bring stakeholders together to agree on the harmonization of key aspects of panel-based TMB estimation, such as the standardization of TMB reporting, standardization of analytical validation studies and the alignment of panel-based TMB values with a reference standard. These harmonization efforts should improve consistency and reliability of panel TMB estimates and aid in clinical decision-making.
Large-scale cancer sequencing data enable discovery of rare germline cancer susceptibility variants. Here we systematically analyse 4,034 cases from The Cancer Genome Atlas cancer cases representing ...12 cancer types. We find that the frequency of rare germline truncations in 114 cancer-susceptibility-associated genes varies widely, from 4% (acute myeloid leukaemia (AML)) to 19% (ovarian cancer), with a notably high frequency of 11% in stomach cancer. Burden testing identifies 13 cancer genes with significant enrichment of rare truncations, some associated with specific cancers (for example, RAD51C, PALB2 and MSH6 in AML, stomach and endometrial cancers, respectively). Significant, tumour-specific loss of heterozygosity occurs in nine genes (ATM, BAP1, BRCA1/2, BRIP1, FANCM, PALB2 and RAD51C/D). Moreover, our homology-directed repair assay of 68 BRCA1 rare missense variants supports the utility of allelic enrichment analysis for characterizing variants of unknown significance. The scale of this analysis and the somatic-germline integration enable the detection of rare variants that may affect individual susceptibility to tumour development, a critical step toward precision medicine.
Despite the explosive growth of genomic data, functional annotation of regulatory sequences remains difficult. Here, we introduce “comparative epigenomics”—interspecies comparison of DNA and histone ...modifications—as an approach for annotation of the regulatory genome. We measured in human, mouse, and pig pluripotent stem cells the genomic distributions of cytosine methylation, H2A.Z, H3K4me1/2/3, H3K9me3, H3K27me3, H3K27ac, H3K36me3, transcribed RNAs, and P300, TAF1, OCT4, and NANOG binding. We observed that epigenomic conservation was strong in both rapidly evolving and slowly evolving DNA sequences, but not in neutrally evolving sequences. In contrast, evolutionary changes of the epigenome and the transcriptome exhibited a linear correlation. We suggest that the conserved colocalization of different epigenomic marks can be used to discover regulatory sequences. Indeed, seven pairs of epigenomic marks identified exhibited regulatory functions during differentiation of embryonic stem cells into mesendoderm cells. Thus, comparative epigenomics reveals regulatory features of the genome that cannot be discerned from sequence comparisons alone.
Display omitted
► Epigenetic patterns of histone and DNA modification are conserved across species ► Epigenomic conservation occurs in both fast- and slow-evolving DNA sequences ► Changes in the epigenome, transcriptome, and protein-DNA binding patterns are correlated ► The conserved colocalization of different epigenetic marks defines regulatory DNA
Interspecies, comparative analysis of DNA methylation and histone modification patterns can be used to identify regulatory DNA sequences and may explain how changes in transcription and protein-DNA interactions arise during evolution.
We report the first large-scale exome-wide analysis of the combined germline-somatic landscape in ovarian cancer. Here we analyse germline and somatic alterations in 429 ovarian carcinoma cases and ...557 controls. We identify 3,635 high confidence, rare truncation and 22,953 missense variants with predicted functional impact. We find germline truncation variants and large deletions across Fanconi pathway genes in 20% of cases. Enrichment of rare truncations is shown in BRCA1, BRCA2 and PALB2. In addition, we observe germline truncation variants in genes not previously associated with ovarian cancer susceptibility (NF1, MAP3K4, CDKN2B and MLL3). Evidence for loss of heterozygosity was found in 100 and 76% of cases with germline BRCA1 and BRCA2 truncations, respectively. Germline-somatic interaction analysis combined with extensive bioinformatics annotation identifies 222 candidate functional germline truncation and missense variants, including two pathogenic BRCA1 and 1 TP53 deleterious variants. Finally, integrated analyses of germline and somatic variants identify significantly altered pathways, including the Fanconi, MAPK and MLL pathways.
MIRMMR predicts microsatellite instability status in cancer samples using methylation and mutation information, in contrast to existing methods that rely on observed microsatellites. Additionally, ...MIRMMR highlights those genetic alterations contributing to microsatellite instability.
Source code is freely available at https://github.com/ding-lab/MIRMMR under the MIT license, implemented in R and supported on Unix/OS X operating systems.
smfoltz@wustl.edu or lding@wustl.edu.
Supplementary data are available at Bioinformatics online.
Recent advancements in sequencing-based DNA methylation profiling methods provide an unprecedented opportunity to map complete DNA methylomes. These include whole-genome bisulfite sequencing (WGBS, ...MethylC-seq, or BS-seq), reduced-representation bisulfite sequencing (RRBS), and enrichment-based methods such as MeDIP-seq, MBD-seq, and MRE-seq. These methods yield largely comparable results but differ significantly in extent of genomic CpG coverage, resolution, quantitative accuracy, and cost, at least while using current algorithms to interrogate the data. None of these existing methods provides single-CpG resolution, comprehensive genome-wide coverage, and cost feasibility for a typical laboratory. We introduce methylCRF, a novel conditional random fields-based algorithm that integrates methylated DNA immunoprecipitation (MeDIP-seq) and methylation-sensitive restriction enzyme (MRE-seq) sequencing data to predict DNA methylation levels at single-CpG resolution. Our method is a combined computational and experimental strategy to produce DNA methylomes of all 28 million CpGs in the human genome for a fraction (<10%) of the cost of whole-genome bisulfite sequencing methods. methylCRF was benchmarked for accuracy against Infinium arrays, RRBS, WGBS sequencing, and locus-specific bisulfite sequencing performed on the same human embryonic stem cell line. methylCRF transformation of MeDIP-seq/MRE-seq was equivalent to a biological replicate of WGBS in quantification, coverage, and resolution. We used conventional bisulfite conversion, PCR, cloning, and sequencing to validate loci where our predictions do not agree with whole-genome bisulfite data, and in 11 out of 12 cases, methylCRF predictions of methylation level agree better with validated results than does whole-genome bisulfite sequencing. Therefore, methylCRF transformation of MeDIP-seq/MRE-seq data provides an accurate, inexpensive, and widely accessible strategy to create full DNA methylomes.
Complex insertions and deletions (indels) are formed by simultaneously deleting and inserting DNA fragments of different sizes at a common genomic location. Here we present a systematic analysis of ...somatic complex indels in the coding sequences of samples from over 8,000 cancer cases using Pindel-C. We discovered 285 complex indels in cancer-associated genes (such as PIK3R1, TP53, ARID1A, GATA3 and KMT2D) in approximately 3.5% of cases analyzed; nearly all instances of complex indels were overlooked (81.1%) or misannotated (17.6%) in previous reports of 2,199 samples. In-frame complex indels are enriched in PIK3R1 and EGFR, whereas frameshifts are prevalent in VHL, GATA3, TP53, ARID1A, PTEN and ATRX. Furthermore, complex indels display strong tissue specificity (such as VHL in kidney cancer samples and GATA3 in breast cancer samples). Finally, structural analyses support findings of previously missed, but potentially druggable, mutations in the EGFR, MET and KIT oncogenes. This study indicates the critical importance of improving complex indel discovery and interpretation in medical research.