UP - logo
E-viri
Recenzirano Odprti dostop
  • Wenger, Aaron M; Peluso, Paul; Rowell, William J; Chang, Pi-Chuan; Hall, Richard J; Concepcion, Gregory T; Ebler, Jana; Fungtammasan, Arkarachai; Kolesnikov, Alexey; Olson, Nathan D; Töpfer, Armin; Alonge, Michael; Mahmoud, Medhat; Qian, Yufeng; Chin, Chen-Shan; Phillippy, Adam M; Schatz, Michael C; Myers, Gene; DePristo, Mark A; Ruan, Jue; Marschall, Tobias; Sedlazeck, Fritz J; Zook, Justin M; Li, Heng; Koren, Sergey; Carroll, Andrew; Rank, David R; Hunkapiller, Michael W

    Nature biotechnology, 10/2019, Letnik: 37, Številka: 10
    Journal Article

    The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.