Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of ...original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost-sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away.
In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows "reuniting" these reads with their respective families increasing the output of the method and making it more cost effective.
We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo.
The mammary gland undergoes hormonally stimulated cycles of proliferation, lactation, and involution. We hypothesized that these factors increase the mutational burden in glandular tissue and may ...explain high cancer incidence rate in the general population, and recurrent disease. Hence, we investigated the DNA sequence variants in the normal mammary gland, tumor, and peripheral blood from 52 reportedly sporadic breast cancer patients. Targeted resequencing of 542 cancer-associated genes revealed subclonal somatic pathogenic variants of: PIK3CA, TP53, AKT1, MAP3K1, CDH1, RB1, NCOR1, MED12, CBFB, TBX3, and TSHR in the normal mammary gland at considerable allelic frequencies (9 × 10
- 5.2 × 10
), indicating clonal expansion. Further evaluation of the frequently damaged PIK3CA and TP53 genes by ultra-sensitive duplex sequencing demonstrated a diversified picture of multiple low-level subclonal (in 10
-10
alleles) hotspot pathogenic variants. Our results raise a question about the oncogenic potential in non-tumorous mammary gland tissue of breast-conserving surgery patients.
De novo mutations (DNMs) are important players in heritable diseases and evolution. Of particular interest are highly recurrent DNMs associated with congenital disorders that have been described as ...selfish mutations expanding in the male germline, thus becoming more frequent with age. Here, we have adapted duplex sequencing (DS), an ultradeep sequencing method that renders sequence information on both DNA strands; thus, one mutation can be reliably called in millions of sequenced bases. With DS, we examined ∼4.5 kb of the
coding region in sperm DNA from older and younger donors. We identified sites with variant allele frequencies (VAFs) of 10
to 10
, with an overall mutation frequency of the region of ∼6 × 10
Some of the substitutions are recurrent and are found at a higher VAF in older donors than in younger ones or are found exclusively in older donors. Also, older donors harbor more mutations associated with congenital disorders. Other mutations are present in both age groups, suggesting that these might result from a different mechanism (e.g., postzygotic mosaicism). We also observe that independent of age, the frequency and deleteriousness of the mutational spectra are more similar to COSMIC than to gnomAD variants. Our approach is an important strategy to identify mutations that could be associated with a gain of function of the receptor tyrosine kinase activity, with unexplored consequences in a society with delayed fatherhood.
Abstract
Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with ...information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.
The immune microenvironment of the brain differs from that of other organs and the role of tumor-infiltrating lymphocytes (TILs) in brain metastases (BM), one of the most common and devastating ...complication of cancer, is unclear. We investigated TIL subsets and their prognostic impact in 116 BM specimens using immunohistochemistry for CD3, CD8, CD45RO, FOXP3, PD1 and PD-L1. The Immunoscore was calculated as published previously. Overall, we found TIL infiltration in 115/116 (99.1%) BM specimens. PD-L1 expression was evident in 19/67 (28.4%) BM specimens and showed no correlation with TIL density (p > 0.05). TIL density was not associated with corticosteroid administration (p > 0.05). A significant difference in infiltration density according to TIL subtype was present (p < 0.001; Chi Square); high infiltration was most frequently observed for CD3+ TILs (95/116; 81.9%) and least frequently for PD1+ TILs (18/116; 15.5%; p < 0.001). Highest TIL density was observed in melanoma, followed by renal cell cancer and lung cancer BM (p < 0.001). The density of CD8
+
TILs correlated positively with the extent of peritumoral edema seen on pre-operative magnetic resonance imaging (p = 0.031). The density of CD3+ (15 vs. 6 mo; p = 0.015), CD8
+
(15 vs. 11 mo; p = 0.030) and CD45RO+ TILs (18 vs. 8 mo; p = 0.006) showed a positive correlation with favorable median OS times. Immunoscore showed significant correlation with survival prognosis (27 vs. 10 mo; p < 0.001). The prognostic impact of Immunoscore was independent from established prognostic parameters at multivariable analysis (HR 0.612, p < 0.001). In conclusion, our data indicate that dense TILs infiltrates are common in BM and correlate with the amount of peritumoral brain edema and survival prognosis, thus identifying the immune system as potential biomarker for cancer patients with CNS affection. Further studies are needed to substantiate our findings.