Three benchtop high-throughput sequencing instruments are now available. The 454 GS Junior (Roche), MiSeq (Illumina) and Ion Torrent PGM (Life Technologies) are laser-printer sized and offer modest ...set-up and running costs. Each instrument can generate data required for a draft bacterial genome sequence in days, making them attractive for identifying and characterizing pathogens in the clinical setting. We compared the performance of these instruments by sequencing an isolate of Escherichia coli O104:H4, which caused an outbreak of food poisoning in Germany in 2011. The MiSeq had the highest throughput per run (1.6 Gb/run, 60 Mb/h) and lowest error rates. The 454 GS Junior generated the longest reads (up to 600 bases) and most contiguous assemblies but had the lowest throughput (70 Mb/run, 9 Mb/h). Run in 100-bp mode, the Ion Torrent PGM had the highest throughput (80–100 Mb/h). Unlike the MiSeq, the Ion Torrent PGM and 454 GS Junior both produced homopolymer-associated indel errors (1.5 and 0.38 errors per 100 bases, respectively).
Twenty years ago, the publication of the first bacterial genome sequence, from Haemophilus influenzae, shook the world of bacteriology. In this Timeline, we review the first two decades of bacterial ...genome sequencing, which have been marked by three revolutions: whole-genome shotgun sequencing, high-throughput sequencing and single-molecule long-read sequencing. We summarize the social history of sequencing and its impact on our understanding of the biology, diversity and evolution of bacteria, while also highlighting spin-offs and translational impact in the clinic. We look forward to a 'sequencing singularity', where sequencing becomes the method of choice for as-yet unthinkable applications in bacteriology and beyond.
Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates ...of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on
weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and
PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.
The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality ...assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment-based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.
The 2015 ACMG/AMP sequence variant interpretation guideline provided a framework for classifying variants based on several benign and pathogenic evidence criteria, including a pathogenic criterion ...(PVS1) for predicted loss of function variants. However, the guideline did not elaborate on specific considerations for the different types of loss of function variants, nor did it provide decision‐making pathways assimilating information about variant type, its location, or any additional evidence for the likelihood of a true null effect. Furthermore, this guideline did not take into account the relative strengths for each evidence type and the final outcome of their combinations with respect to PVS1 strength. Finally, criteria specifying the genes for which PVS1 can be applied are still missing. Here, as part of the ClinGen Sequence Variant Interpretation (SVI) Workgroup's goal of refining ACMG/AMP criteria, we provide recommendations for applying the PVS1 criterion using detailed guidance addressing the above‐mentioned gaps. Evaluation of the refined criterion by seven disease‐specific groups using heterogeneous types of loss of function variants (n = 56) showed 89% agreement with the new recommendation, while discrepancies in six variants (11%) were appropriately due to disease‐specific refinements. Our recommendations will facilitate consistent and accurate interpretation of predicted loss of function variants.
We provide guidance for PVS1 usage that takes into consideration all aspects of putative loss of function (LoF) variants, including type, location, and annotation, and the disease mechanism of the genes they affect. We demonstrate how the combination of these variant and gene attributes can lead to varied PVS1 strength levels. Finally, we evaluate the refined criterion using > 50 LoF variants in several genes and diseases.
With mutations continually occurring in each protein-coding gene (at a rate of ~1 × 10-5 per gene per generation for nonsynonymous variants)36-39 and fitness losses of less than 1% for most novel ...nonsynonymous mutations29-31,34, almost every gene is expected to harbor functionally important variants that can be tested through sequencing, even if these variants are rare. ...the strong interest in exome sequencing stems from three factors: the potential to identify many genes underlying complex traits, straightforward functional annotation of coding variation and a substantially lower cost (approximately five times lower) than that of whole-genome sequencing. ...as sample size increases, the number of observed variants grows much faster than is predicted by the neutral model with constant population size41,42 (Fig. 1).
The decade since the Human Genome Project ended has witnessed a remarkable sequencing technology explosion that has permitted a multitude of questions about the genome to be asked and answered, at ...unprecedented speed and resolution. Here I present examples of how the resulting information has both enhanced our knowledge and expanded the impact of the genome on biomedical research. New sequencing technologies have also introduced exciting new areas of biological endeavour. The continuing upward trajectory of sequencing technology development is enabling clinical applications that are aimed at improving medical diagnosis and treatment.
The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects ...the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.
Information about the identity of an analyte is contained within the mean duration of the current blockades, the amplitude of the blockades, and additional characteristics of the blockades, such as ...an increase in current noise while the analyte is bound. Because signals from related analytes (e.g., various divalent metal ions or structurally related organic molecules) differ, the engineered binding site does not have to be absolutely selective for an analyte, which is often very difficult to achieve. ...it should not be forgotten that nanopore sensing is a platform technology, readily adapted for the detection of virtually any water-soluble analyte (5 ).
Since the days of Sanger sequencing, next-generation sequencing technologies have significantly evolved to provide increased data output, efficiencies, and applications. These next generations of ...technologies can be categorized based on read length. This review provides an overview of these technologies as two paradigms: short-read, or “second-generation,” technologies, and long-read, or “third-generation,” technologies. Herein, short-read sequencing approaches are represented by the most prevalent technologies, Illumina and Ion Torrent, and long-read sequencing approaches are represented by Pacific Biosciences and Oxford Nanopore technologies. All technologies are reviewed along with reported advantages and disadvantages. Until recently, short-read sequencing was thought to provide high accuracy limited by read-length, while long-read technologies afforded much longer read-lengths at the expense of accuracy. Emerging developments for third-generation technologies hold promise for the next wave of sequencing evolution, with the co-existence of longer read lengths and high accuracy.