Mobile elements and repetitive genomic regions are sources of lineage-specific genomic innovation and uniquely fingerprint individual genomes. Comprehensive analyses of such repeat elements, ...including those found in more complex regions of the genome, require a complete, linear genome assembly. We present a de novo repeat discovery and annotation of the T2T-CHM13 human reference genome. We identified previously unknown satellite arrays, expanded the catalog of variants and families for repeats and mobile elements, characterized classes of complex composite repeats, and located retroelement transduction events. We detected nascent transcription and delineated CpG methylation profiles to define the structure of transcriptionally active retroelements in humans, including those in centromeres. These data expand our insight into the diversity, distribution, and evolution of repetitive regions that have shaped the human genome.
Novel sequencing technologies are rapidly expanding the size of data sets that can be applied to phylogenetic studies. Currently the most commonly used phylogenomic approaches involve some form of ...genome reduction. While these approaches make assembling phylogenomic data sets more economical for organisms with large genomes, they reduce the genomic coverage and thereby the long-term utility of the data. Currently, for organisms with moderate to small genomes (<1000 Mbp) it is feasible to sequence the entire genome at modest coverage (10–30×). Computational challenges for handling these large data sets can be alleviated by assembling targeted reads, rather than assembling the entire genome, to produce a phylogenomic data matrix. Here we demonstrate the use of automated Target Restricted Assembly Method (aTRAM) to assemble 1107 single-copy ortholog genes from whole genome sequencing of sucking lice (Anoplura) and out-groups. We developed a pipeline to extract exon sequences from the aTRAM assemblies by annotating them with respect to the original target protein. We aligned these protein sequences with the inferred amino acids and then performed phylogenetic analyses on both the concatenated matrix of genes and on each gene separately in a coalescent analysis. Finally, we tested the limits of successful assembly in aTRAM by assembling 100 genes from close- to distantly related taxa at high to low levels of coverage. Both the concatenated analysis and the coalescent-based analysis produced the same tree topology, which was consistent with previously published results and resolved weakly supported nodes. These results demonstrate that this approach is successful at developing phylogenomic data sets from raw genome sequencing reads. Further, we found that with coverages above 5–10×, aTRAM was successful at assembling 80–90% of the contigs for both close and distantly related taxa. As sequencing costs continue to decline, we expect full genome sequencing will become more feasible for a wider array of organisms, and aTRAM will enable mining of these genomic data sets for an extensive variety of applications, including phylogenomics.
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications
. As a ...result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished
. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome
and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Abstract
Motivation
Non-canonical (or non-B) DNA are genomic regions whose three-dimensional conformation deviates from the canonical double helix. Non-B DNA play an important role in basic cellular ...processes and are associated with genomic instability, gene regulation, and oncogenesis. Experimental methods are low-throughput and can detect only a limited set of non-B DNA structures, while computational methods rely on non-B DNA base motifs, which are necessary but not sufficient indicators of non-B structures. Oxford Nanopore sequencing is an efficient and low-cost platform, but it is currently unknown whether nanopore reads can be used for identifying non-B structures.
Results
We build the first computational pipeline to predict non-B DNA structures from nanopore sequencing. We formalize non-B detection as a novelty detection problem and develop the GoFAE-DND, an autoencoder that uses goodness-of-fit (GoF) tests as a regularizer. A discriminative loss encourages non-B DNA to be poorly reconstructed and optimizing Gaussian GoF tests allows for the computation of P-values that indicate non-B structures. Based on whole genome nanopore sequencing of NA12878, we show that there exist significant differences between the timing of DNA translocation for non-B DNA bases compared with B-DNA. We demonstrate the efficacy of our approach through comparisons with novelty detection methods using experimental data and data synthesized from a new translocation time simulator. Experimental validations suggest that reliable detection of non-B DNA from nanopore sequencing is achievable.
Availability and implementation
Source code is available at https://github.com/bayesomicslab/ONT-nonb-GoFAE-DND.
A data set comprising DNA sequences from 388 loci and >99,000 aligned nucleotide positions, generated using anchored hybrid enrichment, was used to estimate relationships among 138 leafhoppers and ...treehoppers representative of all major lineages of Membracoidea, the most diverse superfamily of hemipteran insects. Phylogenetic analysis of the concatenated nucleotide sequence data set using maximum likelihood produced a tree with most branches receiving high support. A separate coalescent gene tree analysis of the same data generally recovered the same strongly supported clades but was less well resolved overall. Several nodes pertaining to relationships among leafhopper subfamilies currently recognized based on morphological criteria were separated by short internodes and received low support. Although various higher taxa were corroborated with improved branch support, relationships among some major lineages of Membracoidea are only somewhat more resolved than previously published phylogenies based on single gene regions or morphology. In agreement with previous studies, the present results indicate that leafhoppers (Cicadellidae) are paraphyletic with respect to the three recognized families of treehoppers (Aetalionidae, Melizoderidae, and Membracidae). Divergence time estimates indicate that most of the poorly resolved divergence events among major leafhopper lineages occurred during the lower to middle Cretaceous and that most modern leafhopper subfamilies, as well as the lineage comprising the three recognized families of treehoppers, also arose during the Cretaceous.
DNA methylation is critical to the regulation of transposable elements and gene expression and can play an important role in the adaptation of stress response mechanisms in plants. Traditional ...methods of methylation quantification rely on bisulfite conversion that can compromise accuracy. Recent advances in long‐read sequencing technologies allow for methylation detection in real time. The associated algorithms that interpret these modifications have evolved from strictly statistical approaches to Hidden Markov Models and, recently, deep learning approaches. Much of the existing software focuses on methylation in the CG context, but methylation in other contexts is important to quantify, as it is extensively leveraged in plants. Here, we present methylation profiles for two maple species across the full range of 5mC sequence contexts using Oxford Nanopore Technologies (ONT) long‐reads. Hybrid and reference‐guided assemblies were generated for two new Acer accessions: Acer negundo (box elder; 65x ONT and 111X Illumina) and Acer saccharum (sugar maple; 93x ONT and 148X Illumina). The ONT reads generated for these assemblies were re‐basecalled, and methylation detection was conducted in a custom pipeline with the published Acer references (PacBio assemblies) and hybrid assemblies reported herein to generate four epigenomes. Examination of the transposable element landscape revealed the dominance of LTR Copia elements and patterns of methylation associated with different classes of TEs. Methylation distributions were examined at high resolution across gene and repeat density and described within the broader angiosperm context, and more narrowly in the context of gene family dynamics and candidate nutrient stress genes.
Abstract
Translocation programmes are increasingly being informed by genetic data to monitor and enhance conservation outcomes for both natural and established populations. These data provide a ...window into contemporary patterns of genetic diversity, structure and relatedness that can guide managers in how to best source animals for their translocation programmes. The inclusion of historical samples, where possible, strengthens monitoring by allowing assessment of changes in genetic diversity over time and by providing a benchmark for future improvements in diversity via management practices. Here, we used reduced representation sequencing (ddRADseq) data to report on the current genetic health of three remnant and seven translocated boodie (
Bettongia lesueur
) populations, now extinct on the Australian mainland. In addition, we used exon capture data from seven historical mainland specimens and a subset of contemporary samples to compare pre‐decline and current diversity. Both data sets showed the significant impact of population founder source (whether multiple or single) on the genetic diversity of translocated populations. Populations founded by animals from multiple sources showed significantly higher genetic diversity than the natural remnant and single‐source translocation populations, and we show that by mixing the most divergent populations, exon capture heterozygosity was restored to levels close to that observed in pre‐decline mainland samples. Relatedness estimates were surprisingly low across all contemporary populations and there was limited evidence of inbreeding. Our results show that a strategy of genetic mixing has led to successful conservation outcomes for the species in terms of increasing genetic diversity and provides strong rationale for mixing as a management strategy.
Abstract Background To expand its public-sector treatment capacity, Baltimore City made buprenorphine treatment accessible to low-income, largely African American residents. This study compares the ...characteristics of patients entering methadone treatment vs. buprenorphine treatment to determine whether BT was attracting different types of patients. Methods Participants consisted of two samples of adult heroin-dependent African Americans. The first sample was newly admitted to a health center or a mental health center providing buprenorphine ( N = 200), and the second sample was newly admitted to one of two hospital-based methadone programs ( N = 178). The Addiction Severity Index (ASI) and the Friends Supplemental Questionnaire were administered at treatment entry and data were analyzed with logistic regression. Results BT participants were more likely to be female ( p = .017) and less likely to inject ( p = .001). Participants with only prior buprenorphine treatment experience were nearly five time more likely to enter buprenorphine than methadone treatment ( p < .001). Those with experience with both treatments were more than twice as likely to enter BT (OR = 2.7, 95% CI = 1.11–6.62; p = .028). In the 30 days prior to treatment entry, BT participants reported more days of medical problems ( p = .002) and depression ( p = .044), and were more likely to endorse a lifetime history of depression ( p < .001). Conclusion Methadone and buprenorphine treatment provided in the public sector may attract different patient subpopulations. Providing buprenorphine treatment through drug treatment programs co-located with a health and mental health center may have accounted for their higher rates of medical and psychiatric problems and appears to be useful in attracting a diverse group of patients into public-sector funded treatment.
Osteoporosis is a frequent problem in disorders characterized by iron overload, such as the thalassemias and hereditary hemochromatosis. The exact role of iron in the development of osteoporosis in ...these disorders is not established. To define the effect of iron excess in bone, we generated an iron-overloaded mouse by injecting iron dextran at 2 doses into C57/BL6 mice for 2 months. Compared with the placebo group, iron-overloaded mice exhibited dose-dependent increased tissue iron content, changes in bone composition, and trabecular and cortical thinning of bone accompanied by increased bone resorption. Iron-overloaded mice had increased reactive oxygen species and elevated serum tumor necrosis factor-α and interleukin-6 concentrations that correlated with severity of iron overload. Treatment of iron-overloaded mice with the antioxidant N-acetyl-L-cysteine prevented the development of trabecular but not cortical bone abnormalities. This is the first study to demonstrate that iron overload in mice results in increased bone resorption and oxidative stress, leading to changes in bone microarchitecture and material properties and thus bone loss.