Introduces readers to core algorithmic techniques for next-generation sequencing (NGS) data analysis and discusses a wide range of computational techniques and applications This book provides an ...in-depth survey of some of the recent developments in NGS and discusses mathematical and computational challenges in various application areas of NGS technologies. The 18 chapters featured in this book have been authored by bioinformatics experts and represent the latest work in leading labs actively contributing to the fast-growing field of NGS. The book is divided into four parts: Part I focuses on computing and experimental infrastructure for NGS analysis, including chapters on cloud computing, modular pipelines for metabolic pathway reconstruction, pooling strategies for massive viral sequencing, and high-fidelity sequencing protocols. Part II concentrates on analysis of DNA sequencing data, covering the classic scaffolding problem, detection of genomic variants, including insertions and deletions, and analysis of DNA methylation sequencing data. Part III is devoted to analysis of RNA-seq data. This part discusses algorithms and compares software tools for transcriptome assembly along with methods for detection of alternative splicing and tools for transcriptome quantification and differential expression analysis. Part IV explores computational tools for NGS applications in microbiomics, including a discussion on error correction of NGS reads from viral populations, methods for viral quasispecies reconstruction, and a survey of state-of-the-art methods and future trends in microbiome analysis. Computational Methods for Next Generation Sequencing Data Analysis: Reviews computational techniques such as new combinatorial optimization methods, data structures, high performance computing, machine learning, and inference algorithms Discusses the mathematical and computational challenges in NGS technologies Covers NGS error correction, de novo genome transcriptome assembly, variant detection from NGS reads, and more This text is a reference for biomedical professionals interested in expanding their knowledge of computational techniques for NGS data analysis. The book is also useful for graduate and post-graduate students in bioinformatics.
Pooling designs have been widely used in various aspects of DNA sequencing. In biological applications, the well-studied mathematical problem called “group testing” shifts its focus to nonadaptive ...algorithms while the focus of traditional group testing is on sequential algorithms. Biological applications also bring forth new models not previously considered, such as the error-tolerant model, the complex model, and the inhibitor model. This book is the first attempt to collect all the significant research on pooling designs in one convenient place.
Beyond the massive amounts of DNA and genes transferred from the protoorganelle genome to the nucleus during the endosymbiotic event that gave rise to the plastids, stretches of plastid DNA of ...varying size are still being copied and relocated to the nuclear genome in a process that is ongoing and does not result in the concomitant shrinking of the plastid genome. As a result, plant nuclear genomes feature small, but variable, fraction of their genomes of plastid origin, the so-called nuclear plastid DNA sequences (NUPTs). However, the mechanisms underlying the origin and fixation of NUPTs are not yet fully elucidated and research on the topic has been mostly focused on a limited number of species and of plastid DNA. Here, we leveraged a chromosome-scale version of the genome of the orphan crop Moringa oleifera, which features the largest fraction of plastid DNA in any plant nuclear genome known so far, to gain insights into the mechanisms of origin of NUPTs. For this purpose, we examined the chromosomal distribution and arrangement of NUPTs, we explicitly modeled and tested the correlation between their age and size distribution, we characterized their sites of origin at the chloroplast genome and their sites of insertion at the nuclear one, as well as we investigated their arrangement in clusters. We found a bimodal distribution of NUPT relative ages, which implies NUPTs in moringa were formed through two separate events. Furthermore, NUPTs from every event showed markedly distinctive features, suggesting they originated through distinct mechanisms. Our results reveal an unanticipated complexity of the mechanisms at the origin of NUPTs and of the evolutionary forces behind their fixation and highlight moringa species as an exceptional model to assess the impact of plastid DNA in the evolution of the architecture and function of plant nuclear genomes.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
6.
The complete sequence of a human genome Nurk, Sergey; Koren, Sergey; Rhie, Arang ...
Science (American Association for the Advancement of Science),
04/2022, Volume:
376, Issue:
6588
Journal Article
Peer reviewed
Open access
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining ...8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
Publication of the complete genome sequence of Arabidopsis thaliana, the first plant reference genome, in December 2000 heralded the beginning of the plant genome era. Over the past 20 years ...reference genomes have been generated for hundreds of plant species, spanning non-vascular to flowering plants. Releasing these plant genomes has dramatically advanced studies in all disciplines of plant biology. Importantly, multiple reference-level genomes have been generated for the major crops and their progenitors, enabling the creation of pan-genomes and exploration of domestication history and natural variations that can be adopted by modern crop breeding. We summarize the progress of plant genome sequencing and the challenges of sequencing more complex plant genomes and generating pan-genomes.
Over the past 20 years the sequences of over 1000 plant genomes have been published, representing 788 different species with a high level of diversity.Long-read sequencing with novel scaffolding strategies has further revolutionized genome sequencing, enabling access to more chromosome-scale assemblies of plant species with increasing genome complexity and size.Citation trees for the first genome papers for Arabidopsis and rice illustrate substantial developments in plant genomics and a plant genome-enabled renaissance in all disciplines of plant biology over the past 20 years.Constructing near-complete genomes, assembling complex genomes, and building reference pan-genomes are some of the most challenges in future sequencing of plant genomes.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of ...various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat .