Akademska digitalna zbirka SLovenije - logo
E-viri
Celotno besedilo
Recenzirano Odprti dostop
  • GBC: a parallel toolkit bas...
    Zhang, Liubin; Yuan, Yangyang; Peng, Wenjie; Tang, Bin; Li, Mulin Jun; Gui, Hongsheng; Wang, Qiang; Li, Miaoxin

    Genome Biology, 04/2023, Letnik: 24, Številka: 1
    Journal Article

    Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC's data structure and algorithms are valuable for accelerating large-scale genomic research.