Pooling designs have been widely used in various aspects of DNA sequencing. In biological applications, the well-studied mathematical problem called “group testing” shifts its focus to nonadaptive ...algorithms while the focus of traditional group testing is on sequential algorithms. Biological applications also bring forth new models not previously considered, such as the error-tolerant model, the complex model, and the inhibitor model. This book is the first attempt to collect all the significant research on pooling designs in one convenient place.
Beyond the massive amounts of DNA and genes transferred from the protoorganelle genome to the nucleus during the endosymbiotic event that gave rise to the plastids, stretches of plastid DNA of ...varying size are still being copied and relocated to the nuclear genome in a process that is ongoing and does not result in the concomitant shrinking of the plastid genome. As a result, plant nuclear genomes feature small, but variable, fraction of their genomes of plastid origin, the so-called nuclear plastid DNA sequences (NUPTs). However, the mechanisms underlying the origin and fixation of NUPTs are not yet fully elucidated and research on the topic has been mostly focused on a limited number of species and of plastid DNA. Here, we leveraged a chromosome-scale version of the genome of the orphan crop Moringa oleifera, which features the largest fraction of plastid DNA in any plant nuclear genome known so far, to gain insights into the mechanisms of origin of NUPTs. For this purpose, we examined the chromosomal distribution and arrangement of NUPTs, we explicitly modeled and tested the correlation between their age and size distribution, we characterized their sites of origin at the chloroplast genome and their sites of insertion at the nuclear one, as well as we investigated their arrangement in clusters. We found a bimodal distribution of NUPT relative ages, which implies NUPTs in moringa were formed through two separate events. Furthermore, NUPTs from every event showed markedly distinctive features, suggesting they originated through distinct mechanisms. Our results reveal an unanticipated complexity of the mechanisms at the origin of NUPTs and of the evolutionary forces behind their fixation and highlight moringa species as an exceptional model to assess the impact of plastid DNA in the evolution of the architecture and function of plant nuclear genomes.
The complete sequence of a human genome Nurk, Sergey; Koren, Sergey; Rhie, Arang ...
Science (American Association for the Advancement of Science),
04/2022, Volume:
376, Issue:
6588
Journal Article
Peer reviewed
Open access
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining ...8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing ...algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we show that SRF could reconstruct known satellites in human and well-studied model organisms. We also find satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress in genome sequencing, SRF will help the annotation of new genomes and the study of satellite DNA evolution even if such repeats are not fully assembled.
TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of ...various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat .
Publication of the complete genome sequence of Arabidopsis thaliana, the first plant reference genome, in December 2000 heralded the beginning of the plant genome era. Over the past 20 years ...reference genomes have been generated for hundreds of plant species, spanning non-vascular to flowering plants. Releasing these plant genomes has dramatically advanced studies in all disciplines of plant biology. Importantly, multiple reference-level genomes have been generated for the major crops and their progenitors, enabling the creation of pan-genomes and exploration of domestication history and natural variations that can be adopted by modern crop breeding. We summarize the progress of plant genome sequencing and the challenges of sequencing more complex plant genomes and generating pan-genomes.
Over the past 20 years the sequences of over 1000 plant genomes have been published, representing 788 different species with a high level of diversity.Long-read sequencing with novel scaffolding strategies has further revolutionized genome sequencing, enabling access to more chromosome-scale assemblies of plant species with increasing genome complexity and size.Citation trees for the first genome papers for Arabidopsis and rice illustrate substantial developments in plant genomics and a plant genome-enabled renaissance in all disciplines of plant biology over the past 20 years.Constructing near-complete genomes, assembling complex genomes, and building reference pan-genomes are some of the most challenges in future sequencing of plant genomes.
Abstract
Antimicrobial resistance (AMR) is a significant public health threat. With the rise of affordable whole genome sequencing, in silico approaches to assessing AMR gene content can be used to ...detect known resistance mechanisms and potentially identify novel mechanisms. To enable accurate assessment of AMR gene content, as part of a multi-agency collaboration, NCBI developed a comprehensive AMR gene database, the Bacterial Antimicrobial Resistance Reference Gene Database and the AMR gene detection tool AMRFinder. Here, we describe the expansion of the Reference Gene Database, now called the Reference Gene Catalog, to include putative acid, biocide, metal, stress resistance genes, in addition to virulence genes and species-specific point mutations. Genes and point mutations are classified by broad functions, as well as more detailed functions. As we have expanded both the functional repertoire of identified genes and functionality, NCBI released a new version of AMRFinder, known as AMRFinderPlus. This new tool allows users the option to utilize only the core set of AMR elements, or include stress response and virulence genes, too. AMRFinderPlus can detect acquired genes and point mutations in both protein and nucleotide sequence. In addition, the evidence used to identify the gene has been expanded to include whether nucleotide or protein sequence was used, its location in the contig, and presence of an internal stop codon. These database improvements and functional expansions will enable increased precision in identifying AMR genes, linking AMR genotypes and phenotypes, and determining possible relationships between AMR, virulence, and stress response.
Butterfly wing patterns derive from a deeply conserved developmental ground plan yet are diverse and evolve rapidly. It is poorly understood how gene regulatory architectures can accommodate both ...deep homology and adaptive change. To address this, we characterized the cis-regulatory evolution of the color pattern gene WntA in nymphalid butterflies. Comparative assay for transposase-accessible chromatin using sequencing (ATAC-seq) and in vivo deletions spanning 46 cis-regulatory elements across five species revealed deep homology of ground plan–determining sequences, except in monarch butterflies. Furthermore, noncoding deletions displayed both positive and negative regulatory effects that were often broad in nature. Our results provide little support for models predicting rapid enhancer turnover and suggest that deeply ancestral, multifunctional noncoding elements can underlie rapidly evolving trait systems.
The butterfly’s grand ground plan In the 1920s, biologists proposed that butterfly wing pattern diversity evolved as variations of a ground plan of pattern elements that vary in color, shape, and position between different species. Mazo-Vargas et al . found that major aspects of this ground plan are determined by an ancient array of deeply conserved noncoding DNA sequences (see the Perspective by Espeland and Podsiadlowski). These regulatory sequences can have both positive and negative effects, and nuanced interactions between noncoding regions sculpt wing patterns. Deep homology of complex, rapidly evolving traits can thus be reflected in noncoding genomic sequences. —LMZ and DJ
Ancient multifunctional regulatory elements underlie the evolution of butterfly wing color patterns.
Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy ...or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five nonhuman datasets, including California redwood with a ~30-Gb hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly.