Akademska digitalna zbirka SLovenije - logo
E-viri
Recenzirano Odprti dostop
  • Implementation of human who...
    Panda, Abhishek; Subramanian, Krithika; Kahali, Bratati

    Informatics in medicine unlocked, 2021, 2021-00-00, 2021-01-01, Letnik: 25
    Journal Article

    Whole Genome Sequencing (WGS) provides information for each base of the entire 3.2 billion base pairs of the diploid human genome. Therefore, WGS plays an important role in identifying genetic variations for populations and understanding disease signatures in cohort studies or cases with rare genetic disorders. Nonetheless, discoveries from high throughput WGS are dependent on efficient processing, analyzing, and storing this enormous amount of genomic sequencing data, often in the scale of petabytes. Although there has been a significant reduction in genome sequencing costs in recent years, high-performance computation costs have not decreased in a directly proportional fashion. The objective of the present work is to develop a Docker-based container method for human whole genome sequencing data processing and analysis for detecting genetic variations from paired end WGS short reads. Our method provides an approach to simultaneously process multiple genomes within a single compute system while guaranteeing sustained and stable handling of the memory requirements for the genomic data processing and ensuring no unwanted termination of the currently running parallel jobs. This method also achieves a 40 % reduction in execution time. To encourage widespread adoption and ease of WGS analysis, our containerized pipeline will be made publicly available. We have tested this approach for human genome data from Illumina WGS platforms and report the benchmark metrics in two different workstation environments in this communication. Compared to truth sets, our approach calls variants with 99 % precision and recall.