Forty years ago the advent of Sanger sequencing was revolutionary as it allowed complete genome sequences to be deciphered for the first time. A second revolution came when next-generation sequencing ...(NGS) technologies appeared, which made genome sequencing much cheaper and faster. However, NGS methods have several drawbacks and pitfalls, most notably their short reads. Recently, third-generation/long-read methods appeared, which can produce genome assemblies of unprecedented quality. Moreover, these technologies can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing without the need for assembly. This marks the third revolution in sequencing technology. Here we review and compare the various long-read methods. We discuss their applications and their respective strengths and weaknesses and provide future perspectives.
Long-read/third-generation sequencing technologies are causing a new revolution in genomics as they provide a way to study genomes, transcriptomes, and metagenomes at an unprecedented resolution.
SMRT and nanopore sequencing allow for the first time the direct study of different types of DNA base modifications.
Moreover, nanopore technology can sequence directly RNA and identify RNA base modifications.
Owing to the portability of the MinION and the existence of extremely simple library preparation methods, nanopore technology allows the performance of high-throughput sequencing for the first time in the field and at remote places. This is of tremendous importance for the survey of outbreaks in developing countries.
Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To ...date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.
Antimicrobial resistance extracts high morbidity, mortality and economic costs yearly by rendering bacteria immune to antibiotics. Identifying and understanding antimicrobial resistance are ...imperative for clinical practice to treat resistant infections and for public health efforts to limit the spread of resistance. Technologies such as next-generation sequencing are expanding our abilities to detect and study antimicrobial resistance. This Review provides a detailed overview of antimicrobial resistance identification and characterization methods, from traditional antimicrobial susceptibility testing to recent deep-learning methods. We focus on sequencing-based resistance discovery and discuss tools and databases used in antimicrobial resistance studies.
While long-read sequencing allows for the complete assembly of bacterial genomes, long-read assemblies contain a variety of errors. Here, we present Trycycler, a tool which produces a consensus ...assembly from multiple input assemblies of the same genome. Benchmarking showed that Trycycler assemblies contained fewer errors than assemblies constructed with a single tool. Post-assembly polishing further reduced errors and Trycycler+polishing assemblies were the most accurate genomes in our study. As Trycycler requires manual intervention, its output is not deterministic. However, we demonstrated that multiple users converge on similar assemblies that are consistently more accurate than those produced by automated assembly tools.
The increasing burden of dengue virus on public health due to more explosive and frequent outbreaks highlights the need for improved surveillance and control. Genomic surveillance of dengue virus not ...only provides important insights into the emergence and spread of genetically diverse serotypes and genotypes, but it is also critical to monitor the effectiveness of newly implemented control strategies. Here, we present DengueSeq, an amplicon sequencing protocol, which enables whole-genome sequencing of all four dengue virus serotypes.
We developed primer schemes for the four dengue virus serotypes, which can be combined into a pan-serotype approach. We validated both approaches using genetically diverse virus stocks and clinical specimens that contained a range of virus copies. High genome coverage (>95%) was achieved for all genotypes, except DENV2 (genotype VI) and DENV 4 (genotype IV) sylvatics, with similar performance of the serotype-specific and pan-serotype approaches. The limit of detection to reach 70% coverage was 10-100 RNA copies/μL for all four serotypes, which is similar to other commonly used primer schemes. DengueSeq facilitates the sequencing of samples without known serotypes, allows the detection of multiple serotypes in the same sample, and can be used with a variety of library prep kits and sequencing instruments.
DengueSeq was systematically evaluated with virus stocks and clinical specimens spanning the genetic diversity within each of the four dengue virus serotypes. The primer schemes can be plugged into existing amplicon sequencing workflows to facilitate the global need for expanded dengue virus genomic surveillance.
Abstract Herein, the mycobiota was characterized in fecal samples from sick patients and healthy subjects, collected from different geographical locations and using both culturomics and ...amplicon-based metagenomics approaches. Using the culturomics approach, a total of 17,800 fungal colonies were isolated from 14 fecal samples, and resulted in the isolation of 41 fungal species, of which 10 species had not been previously reported in the human gut. Deep sequencing of fungal-directed ITS1 and ITS2 amplicons led to the detection of a total of 142 OTUs and 173 OTUs from the ITS1 and ITS2 regions, respectively. Ascomycota composed the largest fraction of the total OTUs analyzed (78.9% and 68.2% of the OTUs from the ITS1 and ITS2 regions, respectively), followed by Basidiomycota (16.9% and 30.1% of the OTUs from the ITS1 and ITS2 regions, respectively). Interestingly, the results demonstrate that the ITS1/ITS2 amplicon sequencing provides different information about gut fungal communities compared to culturomics, though both approaches complete each other in assessing fungal diversity in fecal samples. We also report higher fungal diversity and abundance in patients compared to healthy subjects. In conclusion, combining both culturomic and amplicon-based metagenomic approaches may be a novel strategy towards analyzing fungal compositions in the human gut.
Although next-generation sequencing (NGS) technology revolutionized sequencing, offering a tremendous sequencing capacity with groundbreaking depth and accuracy, it continues to demonstrate serious ...limitations. In the early 2010s, the introduction of a novel set of sequencing methodologies, presented by two platforms, Pacific Biosciences (PacBio) and Oxford Nanopore Sequencing (ONT), gave birth to third-generation sequencing (TGS). The innovative long-read technologies turn genome sequencing into an ease-of-handle procedure by greatly reducing the average time of library construction workflows and simplifying the process of de novo genome assembly due to the generation of long reads. Long sequencing reads produced by both TGS methodologies have already facilitated the decipherment of transcriptional profiling since they enable the identification of full-length transcripts without the need for assembly or the use of sophisticated bioinformatics tools. Long-read technologies have also provided new insights into the field of epitranscriptomics, by allowing the direct detection of RNA modifications on native RNA molecules. This review highlights the advantageous features of the newly introduced TGS technologies, discusses their limitations and provides an in-depth comparison regarding their scientific background and available protocols as well as their potential utility in research and clinical applications.
There is no effective way to detect structure variations (SVs) and extra-chromosomal circular DNAs (ecDNAs) at single-cell whole-genome level. Here, we develop a novel third-generation sequencing ...platform-based single-cell whole-genome sequencing (scWGS) method named SMOOTH-seq (single-molecule real-time sequencing of long fragments amplified through transposon insertion). We evaluate the method for detecting CNVs, SVs, and SNVs in human cancer cell lines and a colorectal cancer sample and show that SMOOTH-seq reliably and effectively detects SVs and ecDNAs in individual cells, but shows relatively limited accuracy in detection of CNVs and SNVs. SMOOTH-seq opens a new chapter in scWGS as it generates high fidelity reads of kilobases long.
The second Newborn Sequencing in Genomic Medicine and Public Health study was a randomized, controlled trial of the effectiveness of rapid whole-genome or -exome sequencing (rWGS or rWES, ...respectively) in seriously ill infants with diseases of unknown etiology. Here we report comparisons of analytic and diagnostic performance. Of 1,248 ill inpatient infants, 578 (46%) had diseases of unknown etiology. 213 infants (37% of those eligible) were enrolled within 96 h of admission. 24 infants (11%) were very ill and received ultra-rapid whole-genome sequencing (urWGS). The remaining infants were randomized, 95 to rWES and 94 to rWGS. The analytic performance of rWGS was superior to rWES, including variants likely to affect protein function, and ClinVar pathogenic/likely pathogenic variants (p < 0.0001). The diagnostic performance of rWGS and rWES were similar (18 diagnoses in 94 infants 19% versus 19 diagnoses in 95 infants 20%, respectively), as was time to result (median 11.0 versus 11.2 days, respectively). However, the proportion diagnosed by urWGS (11 of 24 46%) was higher than rWES/rWGS (p = 0.004) and time to result was less (median 4.6 days, p < 0.0001). The incremental diagnostic yield of reflexing to trio after negative proband analysis was 0.7% (1 of 147). In conclusion, rapid genomic sequencing can be performed as a first-tier diagnostic test in inpatient infants. urWGS had the shortest time to result, which was important in unstable infants, and those in whom a genetic diagnosis was likely to impact immediate management. Further comparison of urWGS and rWES is warranted because genomic technologies and knowledge of variant pathogenicity are evolving rapidly.
Here we present an in-depth characterization of the mechanism of sequencer-induced sample contamination due to the phenomenon of index swapping that impacts Illumina sequencers employing patterned ...flow cells with Exclusion Amplification (ExAmp) chemistry (HiSeqX, HiSeq4000, and NovaSeq). We also present a remediation method that minimizes the impact of such swaps.
Leveraging data collected over a two-year period, we demonstrate the widespread prevalence of index swapping in patterned flow cell data. We calculate mean swap rates across multiple sample preparation methods and sequencer models, demonstrating that different library methods can have vastly different swapping rates and that even non-ExAmp chemistry instruments display trace levels of index swapping. We provide methods for eliminating sample data cross contamination by utilizing non-redundant dual indexing for complete filtering of index swapped reads, and share the sequences for 96 non-combinatorial dual indexes we have validated across various library preparation methods and sequencer models. Finally, using computational methods we provide a greater insight into the mechanism of index swapping.
Index swapping in pooled libraries is a prevalent phenomenon that we observe at a rate of 0.2 to 6% in all sequencing runs on HiSeqX, HiSeq 4000/3000, and NovaSeq. Utilizing non-redundant dual indexing allows for the removal (flagging/filtering) of these swapped reads and eliminates swapping induced sample contamination, which is critical for sensitive applications such as RNA-seq, single cell, blood biopsy using circulating tumor DNA, or clinical sequencing.