Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that ...use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.
Fragile X-associated tremor ataxia syndrome (FXTAS) results from a CGG repeat expansion in the 5′ UTR of FMR1. This repeat is thought to elicit toxicity as RNA, yet disease brains contain ...ubiquitin-positive neuronal inclusions, a pathologic hallmark of protein-mediated neurodegeneration. We explain this paradox by demonstrating that CGG repeats trigger repeat-associated non-AUG-initiated (RAN) translation of a cryptic polyglycine-containing protein, FMRpolyG. FMRpolyG accumulates in ubiquitin-positive inclusions in Drosophila, cell culture, mouse disease models, and FXTAS patient brains. CGG RAN translation occurs in at least two of three possible reading frames at repeat sizes ranging from normal (25) to pathogenic (90), but inclusion formation only occurs with expanded repeats. In Drosophila, CGG repeat toxicity is suppressed by eliminating RAN translation and enhanced by increased polyglycine protein production. These studies expand the growing list of nucleotide repeat disorders in which RAN translation occurs and provide evidence that RAN translation contributes to neurodegeneration.
•CGG repeats in the 5′ UTR of FMR1 elicit AUG-independent (RAN) translation•This produces an aggregation-prone polyglycine protein found in patients•CGG RAN translation explains pathologic differences in FXTAS mice•CGG RAN translation is critical for CGG repeat toxicity in fly disease models
CGG repeat expansions underlie the neurodegenerative disorder fragile X-associated tremor ataxia syndrome. Todd et al. describe how CGG repeats trigger non-AUG-initiated translation, producing a polyglycine protein that accumulates in FXTAS brains and contributes to toxicity in model systems.
Mobile element insertions (MEIs) represent ∼25% of all structural variants in human genomes. Moreover, when they disrupt genes, MEIs can influence human traits and diseases. Therefore, MEIs should be ...fully discovered along with other forms of genetic variation in whole genome sequencing (WGS) projects involving population genetics, human diseases, and clinical genomics. Here, we describe the Mobile Element Locator Tool (MELT), which was developed as part of the 1000 Genomes Project to perform MEI discovery on a population scale. Using both Illumina WGS data and simulations, we demonstrate that MELT outperforms existing MEI discovery tools in terms of speed, scalability, specificity, and sensitivity, while also detecting a broader spectrum of MEI-associated features. Several run modes were developed to perform MEI discovery on local and cloud systems. In addition to using MELT to discover MEIs in modern humans as part of the 1000 Genomes Project, we also used it to discover MEIs in chimpanzees and ancient (Neanderthal and Denisovan) hominids. We detected diverse patterns of MEI stratification across these populations that likely were caused by (1) diverse rates of MEI production from source elements, (2) diverse patterns of MEI inheritance, and (3) the introgression of ancient MEIs into modern human genomes. Overall, our study provides the most comprehensive map of MEIs to date spanning chimpanzees, ancient hominids, and modern humans and reveals new aspects of MEI biology in these lineages. We also demonstrate that MELT is a robust platform for MEI discovery and analysis in a variety of experimental settings.
The transfer of mitochondrial genetic material into the nuclear genomes of eukaryotes is a well-established phenomenon that has been previously limited to the study of static reference genomes. The ...recent advancement of high throughput sequencing has enabled an expanded exploration into the diversity of polymorphic nuclear mitochondrial insertions (NumtS) within human populations. We have developed an approach to discover and genotype novel Numt insertions using whole genome, paired-end sequencing data. We have applied this method to a thousand individuals in 20 populations from the 1000 Genomes Project and other datasets and identified 141 new sites of Numt insertions, extending our current knowledge of existing NumtS by almost 20%. We find that recent Numt insertions are derived from throughout the mitochondrial genome, including the D-loop, and have integration biases that differ in some respects from previous studies on older, fixed NumtS in the reference genome. We determined the complete inserted sequence for a subset of these events and have identified a number of nearly full-length mitochondrial genome insertions into nuclear chromosomes. We further define their age and origin of insertion and present an analysis of their potential impact to ongoing studies of mitochondrial heteroplasmy and disease.
Two abundant classes of mobile elements, namely Alu and L1 elements, continue to generate new retrotransposon insertions in human genomes. Estimates suggest that these elements have generated ...millions of new germline insertions in individual human genomes worldwide. Unfortunately, current technologies are not capable of detecting most of these young insertions, and the true extent of germline mutagenesis by endogenous human retrotransposons has been difficult to examine. Here, we describe technologies for detecting these young retrotransposon insertions and demonstrate that such insertions indeed are abundant in human populations. We also found that new somatic L1 insertions occur at high frequencies in human lung cancer genomes. Genome-wide analysis suggests that altered DNA methylation may be responsible for the high levels of L1 mobilization observed in these tumors. Our data indicate that transposon-mediated mutagenesis is extensive in human genomes and is likely to have a major impact on human biology and diseases.
Display omitted
► “Transposon-seq” methods were developed to find mobile element insertions in humans ► New germline retrotransposon insertions were identified in personal human genomes ► Tumor-specific somatic L1 insertions were uncovered in human lung cancer genomes ► Transposon mutagenesis is likely to have a major impact on human traits and diseases
In locally advanced p16+ oropharyngeal squamous cell carcinoma (OPSCC), (i) to investigate kinetics of human papillomavirus (HPV) circulating tumor DNA (ctDNA) and association with tumor progression ...after chemoradiation, and (ii) to compare the predictive value of ctDNA to imaging biomarkers of MRI and FDG-PET.
Serial blood samples were collected from patients with AJCC8 stage III OPSCC (
= 34) enrolled on a randomized trial: pretreatment; during chemoradiation at weeks 2, 4, and 7; and posttreatment. All patients also had dynamic-contrast-enhanced and diffusion-weighted MRI, as well as FDG-PET scans pre-chemoradiation and week 2 during chemoradiation. ctDNA values were analyzed for prediction of freedom from progression (FFP), and correlations with aggressive tumor subvolumes with low blood volume (TV
) and low apparent diffusion coefficient (TV
), and metabolic tumor volume (MTV) using Cox proportional hazards model and Spearman rank correlation.
Low pretreatment ctDNA and an early increase in ctDNA at week 2 compared with baseline were significantly associated with superior FFP (
< 0.02 and
< 0.05, respectively). At week 4 or 7, neither ctDNA counts nor clearance were significantly predictive of progression (
= 0.8). Pretreatment ctDNA values were significantly correlated with nodal TV
, TV
, and MTV pre-chemoradiation (
< 0.03), while the ctDNA values at week 2 were correlated with these imaging metrics in primary tumor. Multivariate analysis showed that ctDNA and the imaging metrics performed comparably to predict FFP.
Early ctDNA kinetics during definitive chemoradiation may predict therapy response in stage III OPSCC.
Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), ...which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies. Given this ubiquity of srWGS in large-scale genomics initiatives, we sought to establish expectations for routine SV detection from this data type by comparison with lrWGS assembly, as well as to quantify the genomic properties and added value of SVs uniquely accessible to each technology. Analyses from the Human Genome Structural Variation Consortium (HGSVC) of three families captured ~11,000 SVs per genome from srWGS and ~25,000 SVs per genome from lrWGS assembly. Detection power and precision for SV discovery varied dramatically by genomic context and variant class: 9.7% of the current GRCh38 reference is defined by segmental duplication (SD) and simple repeat (SR), yet 91.4% of deletions that were specifically discovered by lrWGS localized to these regions. Across the remaining 90.3% of reference sequence, we observed extremely high (93.8%) concordance between technologies for deletions in these datasets. In contrast, lrWGS was superior for detection of insertions across all genomic contexts. Given that non-SD/SR sequences encompass 95.9% of currently annotated disease-associated exons, improved sensitivity from lrWGS to discover novel pathogenic deletions in these currently interpretable genomic regions is likely to be incremental. However, these analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment.
We present SquiggleNet, the first deep-learning model that can classify nanopore reads directly from their electrical signals. SquiggleNet operates faster than DNA passes through the pore, allowing ...real-time classification and read ejection. Using 1 s of sequencing data, the classifier achieves significantly higher accuracy than base calling followed by sequence alignment. Our approach is also faster and requires an order of magnitude less memory than alignment-based approaches. SquiggleNet distinguished human from bacterial DNA with over 90% accuracy, generalized to unseen bacterial species in a human respiratory meta genome sample, and accurately classified sequences containing human long interspersed repeat elements.
Mobile element insertions (MEIs) are repetitive genomic sequences that contribute to genetic variation and can lead to genetic disorders. Targeted and whole-genome approaches using short-read ...sequencing have been developed to identify reference and non-reference MEIs; however, the read length hampers detection of these elements in complex genomic regions. Here, we pair Cas9-targeted nanopore sequencing with computational methodologies to capture active MEIs in human genomes. We demonstrate parallel enrichment for distinct classes of MEIs, averaging 44% of reads on-targeted signals and exhibiting a 13.4-54x enrichment over whole-genome approaches. We show an individual flow cell can recover most MEIs (97% L1Hs, 93% AluYb, 51% AluYa, 99% SVA_F, and 65% SVA_E). We identify seventeen non-reference MEIs in GM12878 overlooked by modern, long-read analysis pipelines, primarily in repetitive genomic regions. This work introduces the utility of nanopore sequencing for MEI enrichment and lays the foundation for rapid discovery of elusive, repetitive genetic elements.
Long interspersed element-1 (LINE-1 or L1) amplifies via retrotransposition. Active L1s encode 2 proteins (ORF1p and ORF2p) that bind their encoding transcript to promote retrotransposition in cis . ...The L1-encoded proteins also promote the retrotransposition of smallinterspersed element RNAs, noncoding RNAs, and messenger RNAs in trans . Some L1-mediated retrotransposition events consist of a copy of U6 RNA conjoined to a variably 5′-truncated L1, but how U6/L1 chimeras are formed requires elucidation. Here, we report the following: The RNA ligase RtcB can join U6 RNAs ending in a 2′,3′- cyclic phosphate to L1 RNAs containing a 5′-OH in vitro; depletion of endogenous RtcB in HeLa cell extracts reduces U6/L1 RNA ligation efficiency; retrotransposition of U6/L1 RNAs leads to U6/L1 pseudogene formation; and a unique cohort of U6/L1 chimeric RNAs are present in multiple human cell lines. Thus, these data suggest that U6 small nuclear RNA (snRNA) and RtcB participate in the formation of chimeric RNAs and that retrotransposition of chimeric RNA contributes to interindividual genetic variation.