UNI-MB - logo
UMNIK - logo
 
E-viri
Celotno besedilo
Recenzirano
  • CHARR efficiently estimates...
    Lu, Wenhan; Gauthier, Laura D.; Poterba, Timothy; Giacopuzzi, Edoardo; Goodrich, Julia K.; Stevens, Christine R.; King, Daniel; Daly, Mark J.; Neale, Benjamin M.; Karczewski, Konrad J.

    American journal of human genetics, 12/2023, Letnik: 110, Številka: 12
    Journal Article

    DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a metric to estimate DNA sample contamination from variant-level whole-genome and -exome sequence data called CHARR, contamination from homozygous alternate reference reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VariantDataset format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole-genome and exome sequencing datasets. Display omitted Lu et al. develop CHARR, a method for estimating DNA sample contamination from variant call data, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR accurately recapitulates results from existing tools with substantially reduced cost and increased efficiency, which facilitates downstream analyses of ultra-large genomic data.