Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of Homo sapiens and key model organisms generated by the Human Genome Project. To address this ...problem, we need scalable, cost-effective methods to obtain assemblies with chromosome-scale contiguity. Here we show that genome-wide chromatin interaction data sets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres. To exploit this finding, we developed an algorithm that uses Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. We demonstrate the approach by combining shotgun fragment and short jump mate-pair sequences with Hi-C data to generate chromosome-scale de novo assemblies of the human, mouse and Drosophila genomes, achieving--for the human genome--98% accuracy in assigning scaffolds to chromosome groups and 99% accuracy in ordering and orienting scaffolds within chromosome groups. Hi-C data can also be used to validate chromosomal translocations in cancer genomes.
Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice ...site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes.
We benchmark eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compare experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, is lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieve the best overall performance at distinguishing disruptive and neutral variants, and controlling for overall call rate genome-wide, SpliceAI and Pangolin have superior sensitivity. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues.
SpliceAI and Pangolin show the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Fragment Length of Circulating Tumor DNA Underhill, Hunter R; Kitzman, Jacob O; Hellwig, Sabine ...
PLoS genetics,
07/2016, Letnik:
12, Številka:
7
Journal Article
Recenzirano
Odprti dostop
Malignant tumors shed DNA into the circulation. The transient half-life of circulating tumor DNA (ctDNA) may afford the opportunity to diagnose, monitor recurrence, and evaluate response to therapy ...solely through a non-invasive blood draw. However, detecting ctDNA against the normally occurring background of cell-free DNA derived from healthy cells has proven challenging, particularly in non-metastatic solid tumors. In this study, distinct differences in fragment length size between ctDNAs and normal cell-free DNA are defined. Human ctDNA in rat plasma derived from human glioblastoma multiforme stem-like cells in the rat brain and human hepatocellular carcinoma in the rat flank were found to have a shorter principal fragment length than the background rat cell-free DNA (134-144 bp vs. 167 bp, respectively). Subsequently, a similar shift in the fragment length of ctDNA in humans with melanoma and lung cancer was identified compared to healthy controls. Comparison of fragment lengths from cell-free DNA between a melanoma patient and healthy controls found that the BRAF V600E mutant allele occurred more commonly at a shorter fragment length than the fragment length of the wild-type allele (132-145 bp vs. 165 bp, respectively). Moreover, size-selecting for shorter cell-free DNA fragment lengths substantially increased the EGFR T790M mutant allele frequency in human lung cancer. These findings provide compelling evidence that experimental or bioinformatic isolation of a specific subset of fragment lengths from cell-free DNA may improve detection of ctDNA.
The lack of functional evidence for the majority of missense variants limits their clinical interpretability and poses a key barrier to the broad utility of carrier screening. In Lynch syndrome (LS), ...one of the most highly prevalent cancer syndromes, nearly 90% of clinically observed missense variants are deemed “variants of uncertain significance” (VUS). To systematically resolve their functional status, we performed a massively parallel screen in human cells to identify loss-of-function missense variants in the key DNA mismatch repair factor MSH2. The resulting functional effect map is substantially complete, covering 94% of the 17,746 possible variants, and is highly concordant (96%) with existing functional data and expert clinicians’ interpretations. The large majority (89%) of missense variants were functionally neutral, perhaps unexpectedly in light of its evolutionary conservation. These data provide ready-to-use functional evidence to resolve the ∼1,300 extant missense VUSs in MSH2 and may facilitate the prospective classification of newly discovered variants in the clinic.
Interpreting variants of uncertain significance (VUS) is a central challenge in medical genetics. One approach is to experimentally measure the functional consequences of VUS, but to date this ...approach has been post hoc and low throughput. Here we use massively parallel assays to measure the effects of nearly 2000 missense substitutions in the RING domain of BRCA1 on its E3 ubiquitin ligase activity and its binding to the BARD1 RING domain. From the resulting scores, we generate a model to predict the capacities of full-length BRCA1 variants to support homology-directed DNA repair, the essential role of BRCA1 in tumor suppression, and show that it outperforms widely used biological-effect prediction algorithms. We envision that massively parallel functional assays may facilitate the prospective interpretation of variants observed in clinical sequencing.
In contrast to RNA viruses, double-stranded DNA viruses have low mutation rates yet must still adapt rapidly in response to changing host defenses. To determine mechanisms of adaptation, we subjected ...the model poxvirus vaccinia to serial propagation in human cells, where its antihost factor K3L is maladapted against the antiviral protein kinase R (PKR). Viruses rapidly acquired higher fitness via recurrent K3L gene amplifications, incurring up to 7%–10% increases in genome size. These transient gene expansions were necessary and sufficient to counteract human PKR and facilitated the gain of an adaptive amino acid substitution in K3L that also defeats PKR. Subsequent reductions in gene amplifications offset the costs associated with larger genome size while retaining adaptive substitutions. Our discovery of viral “gene-accordions” explains how poxviruses can rapidly adapt to defeat different host defenses despite low mutation rates and reveals how classical Red Queen conflicts can progress through unrecognized intermediates.
Display omitted
Display omitted
► Poxviruses rapidly adapt against host defenses via highly specific gene amplification ► Gains in vaccinia fitness occur within only a few serial passages in human cells ► Gene expansions precede and can facilitate adaptation via point mutation ► Gene “accordions” reveal new mode of virus adaptation in double-stranded DNA viruses
Despite low mutation rates, poxviruses evolve rapidly via gene amplifications, which facilitate the emergence of better-adapted versions of viral factors by increasing the sampling potential for mutations. Gene amplifications are only transient, revealing a gene “accordion” strategy for virus adaptation.
Despite their importance in gene innovation and phenotypic variation, duplicated regions have remained largely intractable owing to difficulties in accurately resolving their structure, copy number ...and sequence content. We present an algorithm (mrFAST) to comprehensively map next-generation sequence reads, which allows for the prediction of absolute copy-number variation of duplicated segments and genes. We examine three human genomes and experimentally validate genome-wide copy number differences. We estimate that, on average, 73-87 genes vary in copy number between any two individuals and find that these genic differences overwhelmingly correspond to segmental duplications (odds ratio = 135; P < 2.2 x 10(-16)). Our method can distinguish between different copies of highly identical genes, providing a more accurate assessment of gene content and insight into functional constraint without the limitations of array-based technology.
We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of ...generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to > 1 megabase. These pools are "subhaploid," in that the lengths of fragments contained in each pool sums to ∼5% to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate "joins" are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by eight- to 57-fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing midrange contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences.
Lynch syndrome (LS) is a cancer predisposition syndrome affecting more than 1 in every 300 individuals worldwide. Clinical genetic testing for LS can be life-saving but is complicated by the heavy ...burden of variants of uncertain significance (VUS), especially missense changes.
To address this challenge, we leverage a multiplexed analysis of variant effect (MAVE) map covering >94% of the 17,746 possible missense variants in the key LS gene MSH2. To establish this map's utility in large-scale variant reclassification, we overlay it on clinical databases of >15,000 individuals with LS gene variants uncovered during clinical genetic testing. We validate these functional measurements in a cohort of individuals with paired tumor-normal test results and find that MAVE-based function scores agree with the clinical interpretation for every one of the MSH2 missense variants with an available classification. We use these scores to attempt reclassification for 682 unique missense VUS, among which 34 scored as deleterious by our function map, in line with previously published rates for other cancer predisposition genes. Combining functional data and other evidence, ten missense VUS are reclassified as pathogenic/likely pathogenic, and another 497 could be moved to benign/likely benign. Finally, we apply these functional scores to paired tumor-normal genetic tests and identify a subset of patients with biallelic somatic loss of function, reflecting a sporadic Lynch-like Syndrome with distinct implications for treatment and relatives' risk.
This study demonstrates how high-throughput functional assays can empower scalable VUS resolution and prospectively generate strong evidence for variant classification.
Multicellular organisms adopt various strategies to tailor gene expression to cellular contexts including the employment of multiple promoters (and the associated transcription start sites (TSSs)) at ...a single locus that encodes distinct gene isoforms. Schwann cells-the myelinating cells of the peripheral nervous system (PNS)-exhibit a specialized gene expression profile directed by the transcription factor SOX10, which is essential for PNS myelination. SOX10 regulates promoter elements associated with unique TSSs and gene isoforms at several target loci, implicating SOX10-mediated, isoform-specific gene expression in Schwann cell function. Here, we report on genome-wide efforts to identify SOX10-regulated promoters and TSSs in Schwann cells to prioritize genes and isoforms for further study.
We performed global TSS analyses and mined previously reported ChIP-seq datasets to assess the activity of SOX10-bound promoters in three models: (i) an adult mammalian nerve; (ii) differentiating primary Schwann cells, and (iii) cultured Schwann cells with ablated SOX10 function. We explored specific characteristics of SOX10-dependent TSSs, which provides confidence in defining them as SOX10 targets. Finally, we performed functional studies to validate our findings at four previously unreported SOX10 target loci: ARPC1A, CHN2, DDR1, and GAS7. These findings suggest roles for the associated SOX10-regulated gene products in PNS myelination.
In sum, we provide comprehensive computational and functional assessments of SOX10-regulated TSS use in Schwann cells. The data presented in this study will stimulate functional studies on the specific mRNA and protein isoforms that SOX10 regulates, which will improve our understanding of myelination in the peripheral nerve.