Towards population-scale long-read sequencing De Coster, Wouter; Weissensteiner, Matthias H; Sedlazeck, Fritz J
Nature reviews. Genetics,
09/2021, Letnik:
22, Številka:
9
Journal Article
Recenzirano
Odprti dostop
Long-read sequencing technologies have now reached a level of accuracy and yield that allows their application to variant detection at a scale of tens to thousands of samples. Concomitant with the ...development of new computational tools, the first population-scale studies involving long-read sequencing have emerged over the past 2 years and, given the continuous advancement of the field, many more are likely to follow. In this Review, we survey recent developments in population-scale long-read sequencing, highlight potential challenges of a scaled-up approach and provide guidance regarding experimental design. We provide an overview of current long-read sequencing platforms, variant calling methodologies and approaches for de novo assemblies and reference-based mapping approaches. Furthermore, we summarize strategies for variant validation, genotyping and predicting functional impact and emphasize challenges remaining in achieving long-read sequencing at a population scale.
Recent research into structural variants (SVs) has established their importance to medicine and molecular biology, elucidating their role in various diseases, regulation of gene expression, ethnic ...diversity, and large-scale chromosome evolution-giving rise to the differences within populations and among species. Nevertheless, characterizing SVs and determining the optimal approach for a given experimental design remains a computational and scientific challenge. Multiple approaches have emerged to target various SV classes, zygosities, and size ranges. Here, we review these approaches with respect to their ability to infer SVs across the full spectrum of large, complex variations and present computational methods for each approach.
When choosing a read mapper, one faces the trade off between speed and the ability to map reads in highly polymorphic regions. Here, we report NextGenMap, a fast and accurate read mapper, which ...reduces this dilemma. NextGenMap aligns reads reliably to a reference genome even when the sequence difference between target and reference genome is large, i.e. highly polymorphic genome. At the same time, NextGenMap outperforms current mapping methods with respect to runtime and to the number of correctly mapped reads. NextGenMap efficiently uses the available hardware by exploiting multi-core CPUs as well as graphic cards (GPUs), if available. In addition, NextGenMap handles automatically any read data independent of read length and sequencing technology.
NextGenMap source code and documentation are available at: http://cibiv.github.io/NextGenMap/.
fritz.sedlazeck@univie.ac.at.
Supplementary data are available at Bioinformatics online.
GenomeScope is an open-source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate and repeat content from unprocessed short reads. These ...features are essential for studying genome evolution, and help to choose parameters for downstream analysis. We demonstrate its accuracy on 324 simulated and 16 real datasets with a wide range in genome sizes, heterozygosity levels and error rates.
http://genomescope.org , https://github.com/schatzlab/genomescope.git .
mschatz@jhu.edu.
Supplementary data are available at Bioinformatics online.
Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range ...technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.
Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to ...dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr ) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles ) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.
We present RaGOO, a reference-guided contig ordering and orienting tool that leverages the speed and sensitivity of Minimap2 to accurately achieve chromosome-scale assemblies in minutes. After the ...pseudomolecules are constructed, RaGOO identifies structural variants, including those spanning sequencing gaps. We show that RaGOO accurately orders and orients 3 de novo tomato genome assemblies, including the widely used M82 reference cultivar. We then demonstrate the scalability and utility of RaGOO with a pan-genome analysis of 103 Arabidopsis thaliana accessions by examining the structural variants detected in the newly assembled pseudomolecules. RaGOO is available open source at https://github.com/malonge/RaGOO .
The cholinergic basal forebrain (CBF), comprising different groups of cortically projecting cholinergic neurons, plays a crucial role in higher cognitive processes and has been implicated in diverse ...neuropsychiatric disorders. A distinct corticotopic organization of CBF projections has been revealed in animal studies, but little is known about their organization in the human brain. We explored regional differences in functional connectivity (FC) profiles within the human CBF by applying a clustering approach to resting‐state functional magnetic resonance imaging (rs‐fMRI) data of healthy adult individuals (N = 85; 19–85 years). We further examined effects of age on FC of the identified CBF clusters and assessed the reproducibility of cluster‐specific FC profiles in independent data from healthy older individuals (N = 25; 65–89 years). Results showed that the human CBF is functionally organized into distinct anterior‐medial and posterior‐lateral subdivisions that largely follow anatomically defined boundaries of the medial septum/diagonal band and nucleus basalis Meynert. The anterior‐medial CBF subdivision was characterized by connectivity with the hippocampus and interconnected nodes of an extended medial cortical memory network, whereas the posterior‐lateral subdivision was specifically connected to anterior insula and dorsal anterior cingulate components of a salience/attention network. FC of both CBF subdivisions declined with increasing age, but the overall topography of subregion‐specific FC profiles was reproduced in independent rs‐fMRI data of healthy older individuals acquired in a typical clinical setting. Rs‐fMRI‐based assessments of subregion‐specific CBF function may complement established volumetric approaches for the in vivo study of CBF involvement in neuropsychiatric disorders.
Large structural variations (SVs) within genomes are more challenging to identify than smaller genetic variants but may substantially contribute to phenotypic diversity and evolution. We analyse the ...effects of SVs on gene expression, quantitative traits and intrinsic reproductive isolation in the yeast Schizosaccharomyces pombe. We establish a high-quality curated catalogue of SVs in the genomes of a worldwide library of S. pombe strains, including duplications, deletions, inversions and translocations. We show that copy number variants (CNVs) show a variety of genetic signals consistent with rapid turnover. These transient CNVs produce stoichiometric effects on gene expression both within and outside the duplicated regions. CNVs make substantial contributions to quantitative traits, most notably intracellular amino acid concentrations, growth under stress and sugar utilization in winemaking, whereas rearrangements are strongly associated with reproductive isolation. Collectively, these findings have broad implications for evolution and for our understanding of quantitative traits including complex human diseases.