Forty years ago the advent of Sanger sequencing was revolutionary as it allowed complete genome sequences to be deciphered for the first time. A second revolution came when next-generation sequencing ...(NGS) technologies appeared, which made genome sequencing much cheaper and faster. However, NGS methods have several drawbacks and pitfalls, most notably their short reads. Recently, third-generation/long-read methods appeared, which can produce genome assemblies of unprecedented quality. Moreover, these technologies can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing without the need for assembly. This marks the third revolution in sequencing technology. Here we review and compare the various long-read methods. We discuss their applications and their respective strengths and weaknesses and provide future perspectives.
Long-read/third-generation sequencing technologies are causing a new revolution in genomics as they provide a way to study genomes, transcriptomes, and metagenomes at an unprecedented resolution.
SMRT and nanopore sequencing allow for the first time the direct study of different types of DNA base modifications.
Moreover, nanopore technology can sequence directly RNA and identify RNA base modifications.
Owing to the portability of the MinION and the existence of extremely simple library preparation methods, nanopore technology allows the performance of high-throughput sequencing for the first time in the field and at remote places. This is of tremendous importance for the survey of outbreaks in developing countries.
Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To ...date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.
Antimicrobial resistance extracts high morbidity, mortality and economic costs yearly by rendering bacteria immune to antibiotics. Identifying and understanding antimicrobial resistance are ...imperative for clinical practice to treat resistant infections and for public health efforts to limit the spread of resistance. Technologies such as next-generation sequencing are expanding our abilities to detect and study antimicrobial resistance. This Review provides a detailed overview of antimicrobial resistance identification and characterization methods, from traditional antimicrobial susceptibility testing to recent deep-learning methods. We focus on sequencing-based resistance discovery and discuss tools and databases used in antimicrobial resistance studies.
While long-read sequencing allows for the complete assembly of bacterial genomes, long-read assemblies contain a variety of errors. Here, we present Trycycler, a tool which produces a consensus ...assembly from multiple input assemblies of the same genome. Benchmarking showed that Trycycler assemblies contained fewer errors than assemblies constructed with a single tool. Post-assembly polishing further reduced errors and Trycycler+polishing assemblies were the most accurate genomes in our study. As Trycycler requires manual intervention, its output is not deterministic. However, we demonstrated that multiple users converge on similar assemblies that are consistently more accurate than those produced by automated assembly tools.
Although next-generation sequencing (NGS) technology revolutionized sequencing, offering a tremendous sequencing capacity with groundbreaking depth and accuracy, it continues to demonstrate serious ...limitations. In the early 2010s, the introduction of a novel set of sequencing methodologies, presented by two platforms, Pacific Biosciences (PacBio) and Oxford Nanopore Sequencing (ONT), gave birth to third-generation sequencing (TGS). The innovative long-read technologies turn genome sequencing into an ease-of-handle procedure by greatly reducing the average time of library construction workflows and simplifying the process of de novo genome assembly due to the generation of long reads. Long sequencing reads produced by both TGS methodologies have already facilitated the decipherment of transcriptional profiling since they enable the identification of full-length transcripts without the need for assembly or the use of sophisticated bioinformatics tools. Long-read technologies have also provided new insights into the field of epitranscriptomics, by allowing the direct detection of RNA modifications on native RNA molecules. This review highlights the advantageous features of the newly introduced TGS technologies, discusses their limitations and provides an in-depth comparison regarding their scientific background and available protocols as well as their potential utility in research and clinical applications.
There is no effective way to detect structure variations (SVs) and extra-chromosomal circular DNAs (ecDNAs) at single-cell whole-genome level. Here, we develop a novel third-generation sequencing ...platform-based single-cell whole-genome sequencing (scWGS) method named SMOOTH-seq (single-molecule real-time sequencing of long fragments amplified through transposon insertion). We evaluate the method for detecting CNVs, SVs, and SNVs in human cancer cell lines and a colorectal cancer sample and show that SMOOTH-seq reliably and effectively detects SVs and ecDNAs in individual cells, but shows relatively limited accuracy in detection of CNVs and SNVs. SMOOTH-seq opens a new chapter in scWGS as it generates high fidelity reads of kilobases long.
The second Newborn Sequencing in Genomic Medicine and Public Health study was a randomized, controlled trial of the effectiveness of rapid whole-genome or -exome sequencing (rWGS or rWES, ...respectively) in seriously ill infants with diseases of unknown etiology. Here we report comparisons of analytic and diagnostic performance. Of 1,248 ill inpatient infants, 578 (46%) had diseases of unknown etiology. 213 infants (37% of those eligible) were enrolled within 96 h of admission. 24 infants (11%) were very ill and received ultra-rapid whole-genome sequencing (urWGS). The remaining infants were randomized, 95 to rWES and 94 to rWGS. The analytic performance of rWGS was superior to rWES, including variants likely to affect protein function, and ClinVar pathogenic/likely pathogenic variants (p < 0.0001). The diagnostic performance of rWGS and rWES were similar (18 diagnoses in 94 infants 19% versus 19 diagnoses in 95 infants 20%, respectively), as was time to result (median 11.0 versus 11.2 days, respectively). However, the proportion diagnosed by urWGS (11 of 24 46%) was higher than rWES/rWGS (p = 0.004) and time to result was less (median 4.6 days, p < 0.0001). The incremental diagnostic yield of reflexing to trio after negative proband analysis was 0.7% (1 of 147). In conclusion, rapid genomic sequencing can be performed as a first-tier diagnostic test in inpatient infants. urWGS had the shortest time to result, which was important in unstable infants, and those in whom a genetic diagnosis was likely to impact immediate management. Further comparison of urWGS and rWES is warranted because genomic technologies and knowledge of variant pathogenicity are evolving rapidly.
De novo assembly of a human genome using nanopore long-read sequences has been reported, but it used more than 150,000 CPU hours and weeks of wall-clock time. To enable rapid human genome assembly, ...we present Shasta, a de novo long-read assembler, and polishing algorithms named MarginPolish and HELEN. Using a single PromethION nanopore sequencer and our toolkit, we assembled 11 highly contiguous human genomes de novo in 9 d. We achieved roughly 63× coverage, 42-kb read N50 values and 6.5× coverage in reads >100 kb using three flow cells per sample. Shasta produced a complete haploid human genome assembly in under 6 h on a single commercial compute node. MarginPolish and HELEN polished haploid assemblies to more than 99.9% identity (Phred quality score QV = 30) with nanopore reads alone. Addition of proximity-ligation sequencing enabled near chromosome-level scaffolds for all 11 genomes. We compare our assembly performance to existing methods for diploid, haploid and trio-binned human samples and report superior accuracy and speed.
Nearly all infectious agents contain DNA or RNA genomes, making sequencing an attractive approach for pathogen detection. The cost of high-throughput or next-generation sequencing has been reduced by ...several orders of magnitude since its advent in 2004, and it has emerged as an enabling technological platform for the detection and taxonomic characterization of microorganisms in clinical samples from patients. This review focuses on the application of untargeted metagenomic next-generation sequencing to the clinical diagnosis of infectious diseases, particularly in areas in which conventional diagnostic approaches have limitations. The review covers ( a) next-generation sequencing technologies and common platforms, ( b) next-generation sequencing assay workflows in the clinical microbiology laboratory, ( c) bioinformatics analysis of metagenomic next-generation sequencing data, ( d) validation and use of metagenomic next-generation sequencing for diagnosing infectious diseases, and ( e) significant case reports and studies in this area. Next-generation sequencing is a new technology that has the promise to enhance our ability to diagnose, interrogate, and track infectious diseases.
Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the ...characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.