Mutations generate sequence diversity and provide a substrate for selection. The rate of de novo mutations is therefore of major importance to evolution. Here we conduct a study of genome-wide ...mutation rates by sequencing the entire genomes of 78 Icelandic parent-offspring trios at high coverage. We show that in our samples, with an average father's age of 29.7, the average de novo mutation rate is 1.20 × 10(-8) per nucleotide per generation. Most notably, the diversity in mutation rate of single nucleotide polymorphisms is dominated by the age of the father at conception of the child. The effect is an increase of about two mutations per year. An exponential model estimates paternal mutations doubling every 16.5 years. After accounting for random Poisson variation, father's age is estimated to explain nearly all of the remaining variation in the de novo mutation counts. These observations shed light on the importance of the father's age on the risk of diseases such as schizophrenia and autism.
The mutation rate of a specific position in the human genome depends on the sequence context surrounding it. Modeling the mutation rate by estimating a rate for each possible k-mer, however, only ...works for small values of k since the data becomes too sparse for larger values of k. Here we propose a new method that solves this problem by grouping similar k-mers. We refer to the method as k-mer pattern partition and have implemented it in a software package called kmerPaPa. We use a large set of human de novo mutations to show that this new method leads to improved prediction of mutation rates and makes it possible to create models using wider sequence contexts than previous studies. As the first method of its kind, it does not only predict rates for point mutations but also insertions and deletions. We have additionally created a software package called Genovo that, given a k-mer pattern partition model, predicts the expected number of synonymous, missense, and other functional mutation types for each gene. Using this software, we show that the created mutation rate models increase the statistical power to detect genes containing disease-causing variants and to identify genes under strong selective constraint.
Mutation of the DNA molecule is one of the most fundamental processes in biology. In this study, we use 283 parent-offspring trios to estimate the rate of mutation for both single nucleotide variants ...(SNVs) and short length variants (indels) in humans and examine the mutation process. We found 17812 SNVs, corresponding to a mutation rate of 1.29 × 10-8 per position per generation (PPPG) and 1282 indels corresponding to a rate of 9.29 × 10-10 PPPG. We estimate that around 3% of human de novo SNVs are part of a multi-nucleotide mutation (MNM), with 558 (3.1%) of mutations positioned less than 20kb from another mutation in the same individual (median distance of 525bp). The rate of de novo mutations is greater in late replicating regions (p = 8.29 × 10-19) and nearer recombination events (p = 0.0038) than elsewhere in the genome.
The germline mutation rate determines the pace of genome evolution and is an evolving parameter itself
. However, little is known about what determines its evolution, as most studies of mutation ...rates have focused on single species with different methodologies
. Here we quantify germline mutation rates across vertebrates by sequencing and comparing the high-coverage genomes of 151 parent-offspring trios from 68 species of mammals, fishes, birds and reptiles. We show that the per-generation mutation rate varies among species by a factor of 40, with mutation rates being higher for males than for females in mammals and birds, but not in reptiles and fishes. The generation time, age at maturity and species-level fecundity are the key life-history traits affecting this variation among species. Furthermore, species with higher long-term effective population sizes tend to have lower mutation rates per generation, providing support for the drift barrier hypothesis
. The exceptionally high yearly mutation rates of domesticated animals, which have been continually selected on fecundity traits including shorter generation times, further support the importance of generation time in the evolution of mutation rates. Overall, our comparative analysis of pedigree-based mutation rates provides ecological insights on the mutation rate evolution in vertebrates.
In this paper, we mine full mtDNA sequences from an exome capture data set of 2000 Danes, showing that it is possible to get high-quality full-genome sequences of the mitochondrion from this ...resource. The sample includes 1000 individuals with type 2 diabetes and 1000 controls. We characterise the variation found in the mtDNA sequence in Danes and relate the variation to diabetes risk as well as to several blood phenotypes of the controls but find no significant associations. We report 2025 polymorphisms, of which 393 have not been reported previously. These 393 mutations are both very rare and estimated to be caused by very recent mutations but individuals with type 2 diabetes do not possess more of these variants. Population genetics analysis using Bayesian skyline plot shows a recent history of rapid population growth in the Danish population in accordance with the fact that >40% of variable sites are observed as singletons.
Mantle cell lymphoma (MCL) is characterized by marked differences in outcome, emphasizing the need for strong prognostic biomarkers. Here, we explore expression patterns and prognostic relevance of ...circular RNAs (circRNAs), a group of endogenous non-coding RNA molecules, in MCL. We profiled the circRNA expression landscape using RNA-sequencing and explored the prognostic potential of 40 abundant circRNAs in samples from the Nordic MCL2 and MCL3 clinical trials, using NanoString nCounter Technology. We report a circRNA-based signature (circSCORE) developed in the training cohort MCL2 that is highly predictive of time to progression (TTP) and lymphoma-specific survival (LSS). The dismal outcome observed in the large proportion of patients assigned to the circSCORE high-risk group was confirmed in the independent validation cohort MCL3, both in terms of TTP (HR 3.0; P = 0.0004) and LSS (HR 3.6; P = 0.001). In Cox multiple regression analysis incorporating MIPI, Ki67 index, blastoid morphology and presence of TP53 mutations, circSCORE retained prognostic significance for TTP (HR 3.2; P = 0.01) and LSS (HR 4.6; P = 0.01). In conclusion, circRNAs are promising prognostic biomarkers in MCL and circSCORE improves identification of high-risk disease among younger patients treated with cytarabine-containing chemoimmunotherapy and autologous stem cell transplant.
Building a population-specific catalogue of single nucleotide variants (SNVs), indels and structural variants (SVs) with frequencies, termed a national pan-genome, is critical for further advancing ...clinical and public health genetics in large cohorts. Here we report a Danish pan-genome obtained from sequencing 10 trios to high depth (50 × ). We report 536k novel SNVs and 283k novel short indels from mapping approaches and develop a population-wide de novo assembly approach to identify 132k novel indels larger than 10 nucleotides with low false discovery rates. We identify a higher proportion of indels and SVs than previous efforts showing the merits of high coverage and de novo assembly approaches. In addition, we use trio information to identify de novo mutations and use a probabilistic method to provide direct estimates of 1.27e-8 and 1.5e-9 per nucleotide per generation for SNVs and indels, respectively.
Sequencing of cell-free DNA (cfDNA) is currently being used to detect cancer by searching both for mutational and non-mutational alterations. Recent work has shown that the length distribution of ...cfDNA fragments from a cancer patient can inform tumor load and type. Here, we propose non-negative matrix factorization (NMF) of fragment length distributions as a novel and completely unsupervised method for studying fragment length patterns in cfDNA. Using shallow whole-genome sequencing (sWGS) of cfDNA from a cohort of patients with metastatic castration-resistant prostate cancer (mCRPC), we demonstrate how NMF accurately infers the true tumor fragment length distribution as an NMF component - and that the sample weights of this component correlate with ctDNA levels (
r
=0.75). We further demonstrate how using several NMF components enables accurate cancer detection on data from various early stage cancers (AUC = 0.96). Finally, we show that NMF, when applied across genomic regions, can be used to discover fragment length signatures associated with open chromatin.
Mycosis fungoides (MF) is the most frequent form of cutaneous T-cell lymphoma. The disease often takes an indolent course, but in approximately one-third of the patients, the disease progresses to an ...aggressive malignancy with a poor prognosis. At the time of diagnosis, it is impossible to predict which patients develop severe disease and are in need of aggressive treatment. Accordingly, we investigated the prognostic potential of microRNAs (miRNAs) at the time of diagnosis in MF. Using a quantitative reverse transcription polymerase chain reaction platform, we analyzed miRNA expression in diagnostic skin biopsies from 154 Danish patients with early-stage MF. The patients were subdivided into a discovery cohort (n = 82) and an independent validation cohort (n = 72). The miRNA classifier was built using a LASSO (least absolute shrinkage and selection operator) Cox regression to predict progression-free survival (PFS). We developed a 3-miRNA classifier, based on miR-106b-5p, miR-148a-3p, and miR-338-3p, which successfully separated patients into high-risk and low-risk groups of disease progression. PFS was significantly different between these groups in both the discovery cohort and the validation cohort. The classifier was stronger than existing clinical prognostic factors and remained a strong independent prognostic tool after stratification and adjustment for these factors. Importantly, patients in the high-risk group had a significantly reduced overall survival. The 3-miRNA classifier is an effective tool to predict disease progression of early-stage MF at the time of diagnosis. The classifier adds significant prognostic value to existing clinical prognostic factors and may facilitate more individualized treatment of these patients.
•A validated 3-miRNA classifier can effectively predict progression from early- to advanced-stage MF and survival at time of diagnosis.•This classifier outperforms existing clinical prognostic factors and paves the way for implementation of personalized treatment in MF.
Display omitted
Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The ...neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration.
To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures.
We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.