The rapid evolution of 454 GS-FLX sequencing technology has not been accompanied by a reassessment of the quality and accuracy of the sequences obtained. Current strategies for decision-making and ...error-correction are based on an initial analysis by Huse et al. in 2007, for the older GS20 system based on experimental sequences. We analyze here the quality of 454 sequencing data and identify factors playing a role in sequencing error, through the use of an extensive dataset for Roche control DNA fragments.
We obtained a mean error rate for 454 sequences of 1.07%. More importantly, the error rate is not randomly distributed; it occasionally rose to more than 50% in certain positions, and its distribution was linked to several experimental variables. The main factors related to error are the presence of homopolymers, position in the sequence, size of the sequence and spatial localization in PT plates for insertion and deletion errors. These factors can be described by considering seven variables. No single variable can account for the error rate distribution, but most of the variation is explained by the combination of all seven variables.
The pattern identified here calls for the use of internal controls and error-correcting base callers, to correct for errors, when available (e.g. when sequencing amplicons). For shotgun libraries, the use of both sequencing primers and deep coverage, combined with the use of random sequencing primer sites should partly compensate for even high error rates, although it may prove more difficult than previous thought to distinguish between low-frequency alleles and errors.
Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing ...(NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.
This review commemorates the 40th anniversary of DNA sequencing, a period in which we have already witnessed multiple technological revolutions and a growth in scale from a few kilobases to the first ...human genome, and now to millions of human and a myriad of other genomes. DNA sequencing has been extensively and creatively repurposed, including as a 'counter' for a vast range of molecular phenomena. We predict that in the long view of history, the impact of DNA sequencing will be on a par with that of the microscope.
High-throughput sequencing technologies promise to transform the fields of genetics and comparative biology by delivering tens of thousands of genomes in the near future. Although it is feasible to ...construct de novo genome assemblies in a few months, there has been relatively little attention to what is lost by sole application of short sequence reads. We compared the recent de novo assemblies using the short oligonucleotide analysis package (SOAP), generated from the genomes of a Han Chinese individual and a Yoruban individual, to experimentally validated genomic features. We found that de novo assemblies were 16.2% shorter than the reference genome and that 420.2 megabase pairs of common repeats and 99.1% of validated duplicated sequences were missing from the genome. Consequently, over 2,377 coding exons were completely missing. We conclude that high-quality sequencing approaches must be considered in conjunction with high-throughput sequencing for comparative genomics analyses and studies of genome evolution.
Genome sequencing of large numbers of individuals promises to advance the understanding, treatment, and prevention of human diseases, among other applications. We describe a genome sequencing ...platform that achieves efficient imaging and low reagent consumption with combinatorial probe anchor ligation chemistry to independently assay each base from patterned nanoarrays of self-assembling DNA nanoballs. We sequenced three human genomes with this platform, generating an average of 45-to 87-fold coverage per genome and identifying 3.2 to 4.5 million sequence variants per genome. Validation of one genome data set demonstrates a sequence accuracy of about 1 false variant per 100 kilobases. The high accuracy, affordable cost of $4400 for sequencing consumables, and scalability of this platform enable complete human genome sequencing for the detection of rare variants in large-scale genetic studies.
Nanopore analysis is an emerging technique that involves using a voltage to drive molecules through a nanoscale pore in a membrane between two electrolytes, and monitoring how the ionic current ...through the nanopore changes as single molecules pass through it. This approach allows charged polymers (including single-stranded DNA, double-stranded DNA and RNA) to be analysed with subnanometre resolution and without the need for labels or amplification. Recent advances suggest that nanopore-based sensors could be competitive with other third-generation DNA sequencing technologies, and may be able to rapidly and reliably sequence the human genome for under $1,000. In this article we review the use of nanopore technology in DNA sequencing, genetics and medical diagnostics.
Next-generation sequencing technologies have been and continue to be deployed in clinical laboratories, enabling rapid transformations in genomic medicine. These technologies have reduced the cost of ...large-scale sequencing by several orders of magnitude, and continuous advances are being made. It is now feasible to analyze an individual’s near-complete exome or genome to assist in the diagnosis of a wide array of clinical scenarios. Next-generation sequencing technologies are also facilitating further advances in therapeutic decision making and disease prediction for at-risk patients. However, with rapid advances come additional challenges involving the clinical validation and use of these constantly evolving technologies and platforms in clinical laboratories. To assist clinical laboratories with the validation of next-generation sequencing methods and platforms, the ongoing monitoring of next-generation sequencing testing to ensure quality results, and the interpretation and reporting of variants found using these technologies, the American College of Medical Genetics and Genomics has developed the following professional standards and guidelines.
Genet Med15 9, 733–747.
Three benchtop high-throughput sequencing instruments are now available. The 454 GS Junior (Roche), MiSeq (Illumina) and Ion Torrent PGM (Life Technologies) are laser-printer sized and offer modest ...set-up and running costs. Each instrument can generate data required for a draft bacterial genome sequence in days, making them attractive for identifying and characterizing pathogens in the clinical setting. We compared the performance of these instruments by sequencing an isolate of Escherichia coli O104:H4, which caused an outbreak of food poisoning in Germany in 2011. The MiSeq had the highest throughput per run (1.6 Gb/run, 60 Mb/h) and lowest error rates. The 454 GS Junior generated the longest reads (up to 600 bases) and most contiguous assemblies but had the lowest throughput (70 Mb/run, 9 Mb/h). Run in 100-bp mode, the Ion Torrent PGM had the highest throughput (80–100 Mb/h). Unlike the MiSeq, the Ion Torrent PGM and 454 GS Junior both produced homopolymer-associated indel errors (1.5 and 0.38 errors per 100 bases, respectively).
Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughput sequencing experiments. Through a UMI, identical copies arising from distinct ...molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalized. In particular, sequencing errors in the UMI sequence are often ignored or else resolved in an ad hoc manner. We show that errors in the UMI sequence are common and introduce network-based methods to account for these errors when identifying PCR duplicates. Using these methods, we demonstrate improved quantification accuracy both under simulated conditions and real iCLIP and single-cell RNA-seq data sets. Reproducibility between iCLIP replicates and single-cell RNA-seq clustering are both improved using our proposed network-based method, demonstrating the value of properly accounting for errors in UMIs. These methods are implemented in the open source UMI-tools software package.
Twenty years ago, the publication of the first bacterial genome sequence, from Haemophilus influenzae, shook the world of bacteriology. In this Timeline, we review the first two decades of bacterial ...genome sequencing, which have been marked by three revolutions: whole-genome shotgun sequencing, high-throughput sequencing and single-molecule long-read sequencing. We summarize the social history of sequencing and its impact on our understanding of the biology, diversity and evolution of bacteria, while also highlighting spin-offs and translational impact in the clinic. We look forward to a 'sequencing singularity', where sequencing becomes the method of choice for as-yet unthinkable applications in bacteriology and beyond.