Mitochondrial DNA (mtDNA) profiles can be classified into phylogenetic clusters (haplogroups), which is of great relevance for evolutionary, forensic and medical genetics. With the extensive growth ...of the underlying phylogenetic tree summarizing the published mtDNA sequences, the manual process of haplogroup classification would be too time-consuming. The previously published classification tool HaploGrep provided an automatic way to address this issue. Here, we present the completely updated version HaploGrep 2 offering several advanced features, including a generic rule-based system for immediate quality control (QC). This allows detecting artificial recombinants and missing variants as well as annotating rare and phantom mutations. Furthermore, the handling of high-throughput data in form of VCF files is now directly supported. For data output, several graphical reports are generated in real time, such as a multiple sequence alignment format, a VCF format and extended haplogroup QC reports, all viewable directly within the application. In addition, HaploGrep 2 generates a publication-ready phylogenetic tree of all input samples encoded relative to the revised Cambridge Reference Sequence. Finally, new distance measures and optimizations of the algorithm increase accuracy and speed-up the application. HaploGrep 2 can be accessed freely and without any registration at http://haplogrep.uibk.ac.at.
Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing in a genotyped cohort, an approach that can yield high accuracy ...in very large cohorts but attains lower accuracy in smaller cohorts. Here we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform. We demonstrate that Eagle2 attains a ∼20× speedup and ∼10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2× the accuracy of 1000 Genomes-based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server.
The kidneys integrate information from continuous systemic processes related to the absorption, distribution, metabolism and excretion (ADME) of metabolites. To identify underlying molecular ...mechanisms, we performed genome-wide association studies of the urinary concentrations of 1,172 metabolites among 1,627 patients with reduced kidney function. The 240 unique metabolite-locus associations (metabolite quantitative trait loci, mQTLs) that were identified and replicated highlight novel candidate substrates for transport proteins. The identified genes are enriched in ADME-relevant tissues and cell types, and they reveal novel candidates for biotransformation and detoxification reactions. Fine mapping of mQTLs and integration with single-cell gene expression permitted the prioritization of causal genes, functional variants and target cell types. The combination of mQTLs with genetic and health information from 450,000 UK Biobank participants illuminated metabolic mediators, and hence, novel urinary biomarkers of disease risk. This comprehensive resource of genetic targets and their substrates is informative for ADME processes in humans and is relevant to basic science, clinical medicine and pharmaceutical research.
The availability of polygenic scores for type 2 diabetes (T2D) raises the question, whether assessing family history might become redundant. However, family history not only involves shared genetics, ...but also shared environment. It was the aim of this study to assess the independent and combined effects of one family risk score (FamRS) and a polygenic score (PGS) on prevalent and incident T2D risk in a population-based study from Germany (n = 3071). The study was conducted in 2004/2005 with up to 12 years of follow-up. The FamRS takes into account not only the number of diseased first grade relatives, but also age at onset of the relatives and age of participants. 256 prevalent and additional 163 incident T2D cases were recorded. Prevalence of T2D increased sharply for those within the top quantile of the PGS distribution resulting in an OR of 19.16 (p < 2 × 10
) for the top 20% compared to the remainder of the population, independent of age, sex, BMI, physical activity and FamRS. On the other hand, having a very strong family risk compared to average was still associated with an OR of 2.78 (p = 0.001), independent of the aforementioned factors and the PGS. The PGS and FamRS were only slightly correlated (r
= 0.018). The combined contribution of both factors varied with varying age-groups, though, with decreasing influence of the PGS with increasing age. To conclude, both, genetic information and family history are relevant for the prediction of T2D risk and might be used for identification of high risk groups to personalize prevention measures.
Abstract Background Variable number tandem repeats (VNTRs) are highly polymorphic DNA regions harboring many potentially disease-causing variants. However, VNTRs often appear unresolved (“dark”) in ...variation databases due to their repetitive nature. One particularly complex and medically relevant VNTR is the KIV-2 VNTR located in the cardiovascular disease gene LPA which encompasses up to 70% of the coding sequence. Results Using the highly complex LPA gene as a model, we develop a computational approach to resolve intra-repeat variation in VNTRs from largely available short-read sequencing data. We apply the approach to six protein-coding VNTRs in 2504 samples from the 1000 Genomes Project and developed an optimized method for the LPA KIV-2 VNTR that discriminates the confounding KIV-2 subtypes upfront. This results in an F1-score improvement of up to 2.1-fold compared to previously published strategies. Finally, we analyze the LPA VNTR in > 199,000 UK Biobank samples, detecting > 700 KIV-2 mutations. This approach successfully reveals new strong Lp(a)-lowering effects for KIV-2 variants, with protective effect against coronary artery disease, and also validated previous findings based on tagging SNPs. Conclusions Our approach paves the way for reliable variant detection in VNTRs at scale and we show that it is transferable to other dark regions, which will help unlock medical information hidden in VNTRs.
Lipoprotein (a) Lp(a) concentrations are among the strongest genetic risk factors for cardiovascular disease and present pronounced interethnic and interindividual differences. Approximately ...90#x0025; of Lp(a) variance is controlled by the LPA gene, which contains a 5.6-kb-large copy number variation kringle IV type 2 (KIV-2) repeat that generates >40 protein isoforms. Variants within the KIV-2 region are not called in common sequencing projects, leaving up to 70#x0025; of the LPA coding region currently unaddressed. To completely assess the variability in LPA, we developed a sequencing strategy for this region and report here the first map of genetic variation in the KIV-2 region, a comprehensively evaluated ultradeep sequencing protocol, and an easy-to-use variant analysis pipeline. We sequenced 123 Central-European individuals and reanalyzed public data of 2,504 individuals from 26 populations. We found 14 different loss-of-function and splice-site mutations, as well as >100, partially even common, missense variants. Some coding variants were frequent in one population but absent in others. This provides novel candidates to explain the large ethnic and individual differences in Lp(a) concentrations. Importantly, our approach and pipeline are also applicable to other similar copy number variable regions, allowing access to regions that are not captured by common genome sequencing.
Lipoprotein(a) Lp(a) concentrations are regulated by the LPA gene mainly via the large kringle IV-type 2 (KIV-2) copy number variation and multiple causal variants. Early studies suggested an effect ...of long pentanucleotide repeat (PNR) alleles (10 and 11 repeats, PNR10 and PNR11) in the LPA promoter on gene transcription and found an association with lower Lp(a). Subsequent in vitro studies showed no effects on mRNA transcription, but the association with strongly decreased Lp(a) remained consistent. We investigated the isolated and combined effect of PNR10, PNR11, and the frequent splice site variant KIV-2 4925G>A on Lp(a) concentrations in the Cooperative Health Research in the Region of Augsburg F4 study by multiple quantile regression in single-SNP and joint models. Data on Lp(a), apolipoprotein(a) Western blot isoforms, and variant genotypes were available for 2,858 individuals. We found a considerable linkage disequilibrium between KIV-2 4925G>A and the alleles PNR10 and PNR11. In single-variant analysis adjusted for age, sex, and the shorter apo(a) isoform, we determined that both PNR alleles were associated with a highly significant Lp(a) decrease (PNR10: β = −14.43 mg/dl, 95% CI: −15.84, −13.02, P = 3.33e-84; PNR11: β = −17.21 mg/dl, 95% CI: −20.19, −14.23, P = 4.01e-29). However, a joint model, adjusting the PNR alleles additionally for 4925G>A, abolished the effect on Lp(a) (PNR10: β = +0.44 mg/dl, 95% CI: −1.73, 2.60, P = 0.69; PNR11: β = −1.52 mg/dl, 95% CI: −6.05, 3.00, P = 0.51). Collectively, we conclude that the previously reported Lp(a) decrease observed in pentanucleotide alleles PNR10 or PNR11 carriers results from a linkage disequilibrium with the frequent splicing mutation KIV-2 4925G>A.
Mitochondrial DNA copy number (mtDNA-CN) is a biomarker for mitochondrial dysfunction associated with several diseases. Previous genome-wide association studies (GWAS) have been performed to unravel ...underlying mechanisms of mtDNA-CN regulation. However, the identified gene regions explain only a small fraction of mtDNA-CN variability. Most of this data has been estimated from microarrays based on various pipelines. In the present study we aimed to (1) identify genetic loci for qPCR-measured mtDNA-CN from three studies (16,130 participants) using GWAS, (2) identify potential systematic differences between our qPCR derived mtDNA-CN measurements compared to the published microarray intensity-based estimates, and (3) disentangle the nuclear from mitochondrial regulation of the mtDNA-CN phenotype. We identified two genome-wide significant autosomal loci associated with qPCR-measured mtDNA-CN: at HBS1L (rs4895440, p = 3.39 × 10
) and GSDMA (rs56030650, p = 4.85 × 10
) genes. Moreover, 113/115 of the previously published SNPs identified by microarray-based analyses were significantly equivalent with our findings. In our study, the mitochondrial genome itself contributed only marginally to mtDNA-CN regulation as we only detected a single rare mitochondrial variant associated with mtDNA-CN. Furthermore, we incorporated mitochondrial haplogroups into our analyses to explore their potential impact on mtDNA-CN. However, our findings indicate that they do not exert any significant influence on our results.