ADSP Whole Genome Sequencing (WGS) Release 4 Data Update from Genome Center for Alzheimer’s Disease

E-viri

Recenzirano Odprti dostop

ADSP Whole Genome Sequencing (WGS) Release 4 Data Update from Genome Center for Alzheimer’s Disease

Leung, Yuk Yee; Lee, Wan‐Ping; Kuzma, Amanda B; Gangadharan, Prabhakaran; Nicaretta, Heather Issen; Qu, Liming; Ren, Youli; Cantwell, Laura B; Valladares, Otto; Zhao, Yi; Iqbal, Taha; Schmidt, Michael A.; Mena, Pedro R.; Vardarajan, Badri N; Dalgard, Clifton L.; Kunkle, Brian W.; Bush, William S.; Martin, Eden R.; Naj, Adam C.; Haines, Jonathan L.; Pericak‐Vance, Margaret A.; Wang, Li‐San; Schellenberg, Gerald D.

Alzheimer's & dementia, December 2023, 2023-12-00, Letnik: 19, Številka: S12

Journal Article

Background The Genome Center for Alzheimer’s Disease (GCAD) coordinates the integration of all available Alzheimer’s disease (AD) relevant whole genome sequencing (WGS) data with the goal of identifying AD risk or protective genetic variants and eventual therapeutic targets. The WGS datasets are generated through collaboration between investigators from the Alzheimer’s Disease Sequencing Project (ADSP) and GCAD. With the goal of minimizing data heterogeneity, introduced by different sequencing protocols and assays, GCAD processes all samples using standardized pipelines and performs quality control (QC)/quality assurance (QA) checks. Methods Raw sequencing data (FASTQs or BAMs) were aligned to GRCh38/hg38 by BWA, and variant calling and joint genotyping on single nucleotide variants (SNVs), insertions and deletions (indels), were done by GATK. Structural variants (SVs) were called per sample using the Smoove, Manta, and Strelka packages. Preliminary QA checks including sex check, contamination, and genotype concordance were performed followed by QC per ADSP protocol to evaluate the quality of samples and variants. To facilitate access and usage of massive joint‐genotype called VCF files, a compact version for storing variant info and sample genotypes only was released first. Results We dropped 275 (0.7%) samples of poor coverage (<20×), and we flagged 219 (0.6%) samples that were of borderline quality. As a result, the dataset (ADSP Release 4, 2022) includes 36,361 genomes from 40 diverse cohorts with 4 major ancestries: 16,573 Non‐Hispanic Whites, 11,358 Hispanics; 5,422 African Americans; and 2,802 Asians. Data are deeply sequenced (average genome coverage: 40x). All samples’ CRAMs and gVCFs from GATK were deposited into NIAGADS Data Sharing Service (DSS) (https://dss.niagads.org/) for public distribution. Joint‐genotyped called VCFs are undergoing a full QC/annotation process and will be made available. This joint‐genotyped called VCF contains >362M bi‐allelic variants, >58M multi‐allelic variants, with 95% of variants remaining after QC. SV calling is ongoing and data will be ready prior to the conference. Conclusion The ADSP and GCAD generate high quality SNVs, indels and SV calls. Currently GCAD is preparing the next release of ∼60,000 more ancestrally‐diverse WGS samples sequenced primarily through the ADSP Follow‐Up Study, which we anticipate will be released in 2023 to greatly benefit the AD genetics community.

Dostop do baze podatkov JCR je dovoljen samo uporabnikom iz Slovenije. Vaš trenutni IP-naslov ni na seznamu dovoljenih za dostop, zato je potrebna avtentikacija z ustreznim računom AAI.

Leto	Faktor vpliva		Izdaja		Kategorija		Razvrstitev
Leto	JCR	SNIP	JCR	SNIP	JCR	SNIP	JCR	SNIP

Povezave do osebnih bibliografij avtorjev	Povezave do podatkov o raziskovalcih v sistemu SICRIS

Vir: Osebne bibliografije in: SICRIS

Naloži sliko

Vnos na polico

Dodajanje gradiva na polico je uspelo.

Dodajanje gradiva na polico je spodletelo.

Dodajanje gradiva na polico ni bilo potrebno.

Trajna povezava

E-pošta

Faktor vpliva

Izberite knjižnično izkaznico:

Baze podatkov, v katerih je revija indeksirana

Citiranje

Tema