DIKUL - logo
E-resources
Peer reviewed Open access
  • A robust benchmark for dete...
    Zook, Justin M; Hansen, Nancy F; Olson, Nathan D; Chapman, Lesley; Mullikin, James C; Xiao, Chunlin; Sherry, Stephen; Koren, Sergey; Phillippy, Adam M; Boutros, Paul C; Sahraeian, Sayed Mohammad E; Huang, Vincent; Rouette, Alexandre; Alexander, Noah; Mason, Christopher E; Hajirasouliha, Iman; Ricketts, Camir; Lee, Joyce; Tearle, Rick; Fiddes, Ian T; Barrio, Alvaro Martinez; Wala, Jeremiah; Carroll, Andrew; Ghaffari, Noushin; Rodriguez, Oscar L; Bashir, Ali; Jackman, Shaun; Farrell, John J; Wenger, Aaron M; Alkan, Can; Soylev, Arda; Schatz, Michael C; Garg, Shilpa; Church, George; Marschall, Tobias; Chen, Ken; Fan, Xian; English, Adam C; Rosenfeld, Jeffrey A; Zhou, Weichen; Mills, Ryan E; Sage, Jay M; Davis, Jennifer R; Kaiser, Michael D; Oliver, John S; Catalano, Anthony P; Chaisson, Mark J P; Spies, Noah; Sedlazeck, Fritz J; Salit, Marc

    Nature biotechnology, 11/2020, Volume: 38, Issue: 11
    Journal Article

    New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.