Abstract
Summary
PhenoScanner is a curated database of publicly available results from large-scale genetic association studies in humans. This online tool facilitates ‘phenome scans’, where genetic ...variants are cross-referenced for association with many phenotypes of different types. Here we present a major update of PhenoScanner (‘PhenoScanner V2’), including over 150 million genetic variants and more than 65 billion associations (compared to 350 million associations in PhenoScanner V1) with diseases and traits, gene expression, metabolite and protein levels, and epigenetic markers. The query options have been extended to include searches by genes, genomic regions and phenotypes, as well as for genetic variants. All variants are positionally annotated using the Variant Effect Predictor and the phenotypes are mapped to Experimental Factor Ontology terms. Linkage disequilibrium statistics from the 1000 Genomes project can be used to search for phenotype associations with proxy variants.
Availability and implementation
PhenoScanner V2 is available at www.phenoscanner.medschl.cam.ac.uk.
PhenoScanner is a curated database of publicly available results from large-scale genetic association studies. This tool aims to facilitate 'phenome scans', the cross-referencing of genetic variants ...with many phenotypes, to help aid understanding of disease pathways and biology. The database currently contains over 350 million association results and over 10 million unique genetic variants, mostly single nucleotide polymorphisms. It is accompanied by a web-based tool that queries the database for associations with user-specified variants, providing results according to the same effect and non-effect alleles for each input variant. The tool provides the option of searching for trait associations with proxies of the input variants, calculated using the European samples from 1000 Genomes and Hapmap.
PhenoScanner is available at www.phenoscanner.medschl.cam.ac.uk CONTACT: jrs95@medschl.cam.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.
Abstract
Introduction
Genetic associations for variants identified through genome-wide association studies (GWASs) tend to be overestimated in the original discovery data set as, if the association ...was underestimated, the variant may not have been detected. This bias, known as winner’s curse, can affect Mendelian randomization estimates, but its severity and potential impact are unclear.
Methods
We performed an empirical investigation to assess the potential bias from winner’s curse in practice. We considered Mendelian randomization estimates for the effect of body mass index (BMI) on coronary artery disease risk. We randomly divided a UK Biobank data set 100 times into three equally sized subsets. The first subset was treated as the ‘discovery GWAS’. We compared genetic associations estimated in the discovery GWAS to those estimated in the other subsets for each of the 100 iterations.
Results
For variants associated with BMI at P < 5 × 10–8 in at least one iteration, genetic associations with BMI were up to 5-fold greater in iterations in which the variant was associated with BMI at P < 5 × 10–8 compared with its mean association across all iterations. If the minimum P-value for association with BMI was P = 10–13 or lower, then this inflation was <25%. Mendelian randomization estimates were affected by winner’s curse bias. However, bias did not materially affect results; all analyses indicated a deleterious effect of BMI on coronary artery disease risk.
Conclusions
Winner’s curse can bias Mendelian randomization estimates, although its practical impact may not be substantial. If avoiding sample overlap is infeasible, analysts should consider performing a sensitivity analysis based on variants strongly associated with the exposure.
Recent genome-wide association studies in stroke have enabled the generation of genomic risk scores (GRS) but their predictive power has been modest compared to established stroke risk factors. Here, ...using a meta-scoring approach, we develop a metaGRS for ischaemic stroke (IS) and analyse this score in the UK Biobank (n = 395,393; 3075 IS events by age 75). The metaGRS hazard ratio for IS (1.26, 95% CI 1.22-1.31 per metaGRS standard deviation) doubles that of a previous GRS, identifying a subset of individuals at monogenic levels of risk: the top 0.25% of metaGRS have three-fold risk of IS. The metaGRS is similarly or more predictive compared to several risk factors, such as family history, blood pressure, body mass index, and smoking. We estimate the reductions needed in modifiable risk factors for individuals with different levels of genomic risk and suggest that, for individuals with high metaGRS, achieving risk factor levels recommended by current guidelines may be insufficient to mitigate risk.
ATP citrate lyase is an enzyme in the cholesterol-biosynthesis pathway upstream of 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR), the target of statins. Whether the genetic inhibition of ...ATP citrate lyase is associated with deleterious outcomes and whether it has the same effect, per unit decrease in the low-density lipoprotein (LDL) cholesterol level, as the genetic inhibition of HMGCR is unclear.
We constructed genetic scores composed of independently inherited variants in the genes encoding ATP citrate lyase (
) and HMGCR to create instruments that mimic the effect of ATP citrate lyase inhibitors and HMGCR inhibitors (statins), respectively. We then compared the associations of these genetic scores with plasma lipid levels, lipoprotein levels, and the risk of cardiovascular events and cancer.
A total of 654,783 participants, including 105,429 participants who had major cardiovascular events, were included in the study. The
and
scores were associated with similar patterns of changes in plasma lipid and lipoprotein levels and with similar effects on the risk of cardiovascular events per decrease of 10 mg per deciliter in the LDL cholesterol level: odds ratio for cardiovascular events, 0.823 (95% confidence interval CI, 0.78 to 0.87; P = 4.0×10
) for the
score and 0.836 (95% CI, 0.81 to 0.87; P = 3.9×10
) for the
score. Neither lifelong genetic inhibition of ATP citrate lyase nor lifelong genetic inhibition of HMGCR was associated with an increased risk of cancer.
Genetic variants that mimic the effect of ATP citrate lyase inhibitors and statins appeared to lower plasma LDL cholesterol levels by the same mechanism of action and were associated with similar effects on the risk of cardiovascular disease per unit decrease in the LDL cholesterol level. (Funded by Esperion Therapeutics and others.).
Short telomeres have been linked to various age-related diseases. We aimed to assess the association of telomere length with incident type 2 diabetes mellitus (T2DM) in prospective cohort studies.
...Leucocyte relative telomere length (RTL) was measured using quantitative polymerase chain reaction in 684 participants of the prospective population-based Bruneck Study (1995 baseline), with repeat RTL measurements performed in 2005 (n = 558) and 2010 (n = 479). Hazard ratios for T2DM were calculated across quartiles of baseline RTL using Cox regression models adjusted for age, sex, body-mass index, smoking, socio-economic status, physical activity, alcohol consumption, high-density lipoprotein cholesterol, log high-sensitivity C-reactive protein, and waist-hip ratio. Separate analyses corrected hazard ratios for within-person variability using multivariate regression calibration of repeated measurements. To contextualise findings, we systematically sought PubMed, Web of Science and EMBASE for relevant articles and pooled results using random-effects meta-analysis.
Over 15 years of follow-up, 44 out of 606 participants free of diabetes at baseline developed incident T2DM. The adjusted hazard ratio for T2DM comparing the bottom vs. the top quartile of baseline RTL (i.e. shortest vs. longest) was 2.00 (95% confidence interval: 0.90 to 4.49; P = 0.091), and 2.31 comparing the bottom quartile vs. the remainder (1.21 to 4.41; P = 0.011). The corresponding hazard ratios corrected for within-person RTL variability were 3.22 (1.27 to 8.14; P = 0.014) and 2.86 (1.45 to 5.65; P = 0.003). In a random-effects meta-analysis of three prospective cohort studies involving 6,991 participants and 2,011 incident T2DM events, the pooled relative risk was 1.31 (1.07 to 1.60; P = 0.010; I2 = 69%).
Low RTL is independently associated with the risk of incident T2DM. To avoid regression dilution biases in observed associations of RTL with disease risk, future studies should implement methods correcting for within-person variability in RTL. The causal role of short telomeres in T2DM development remains to be determined.
Identifying genetic variants associated with circulating protein concentrations (protein quantitative trait loci; pQTLs) and integrating them with variants from genome-wide association studies (GWAS) ...may illuminate the proteome's causal role in disease and bridge a knowledge gap regarding SNP-disease associations. We provide the results of GWAS of 71 high-value cardiovascular disease proteins in 6861 Framingham Heart Study participants and independent external replication. We report the mapping of over 16,000 pQTL variants and their functional relevance. We provide an integrated plasma protein-QTL database. Thirteen proteins harbor pQTL variants that match coronary disease-risk variants from GWAS or test causal for coronary disease by Mendelian randomization. Eight of these proteins predict new-onset cardiovascular disease events in Framingham participants. We demonstrate that identifying pQTLs, integrating them with GWAS results, employing Mendelian randomization, and prospectively testing protein-trait associations holds potential for elucidating causal genes, proteins, and pathways for cardiovascular disease and may identify targets for its prevention and treatment.
Abstract
Large-scale genome-wide association studies conducted over the last decade have uncovered numerous genetic variants associated with cardiometabolic traits and risk factors. These discoveries ...have enabled the Mendelian randomization (MR) design, which uses genetic variation as a natural experiment to improve causal inferences from observational data. By analogy with the random assignment of treatment in randomized controlled trials, the random segregation of genetic alleles when DNA is transmitted from parents to offspring at gamete formation is expected to reduce confounding in genetic associations. Mendelian randomization analyses make a set of assumptions that must hold for valid results. Provided that the assumptions are well justified for the genetic variants that are employed as instrumental variables, MR studies can inform on whether a putative risk factor likely has a causal effect on the disease or not. Mendelian randomization has been increasingly applied over recent years to predict the efficacy and safety of existing and novel drugs targeting cardiovascular risk factors and to explore the repurposing potential of available drugs. This review article describes the principles of the MR design and some applications in cardiovascular epidemiology.
Graphical Abstract
Graphical Abstract
Mendelian randomization findings on major cardiometabolic and lifestyle factors and common cardiovascular diseases.
Mendelian randomization uses genetic variants, assumed to be instrumental variables for a particular exposure, to estimate the causal effect of that exposure on an outcome. If the instrumental ...variable criteria are satisfied, the resulting estimator is consistent even in the presence of unmeasured confounding and reverse causation.
We extend the Mendelian randomization paradigm to investigate more complex networks of relationships between variables, in particular where some of the effect of an exposure on the outcome may operate through an intermediate variable (a mediator). If instrumental variables for the exposure and mediator are available, direct and indirect effects of the exposure on the outcome can be estimated, for example using either a regression-based method or structural equation models. The direction of effect between the exposure and a possible mediator can also be assessed. Methods are illustrated in an applied example considering causal relationships between body mass index, C-reactive protein and uric acid.
These estimators are consistent in the presence of unmeasured confounding if, in addition to the instrumental variable assumptions, the effects of both the exposure on the mediator and the mediator on the outcome are homogeneous across individuals and linear without interactions. Nevertheless, a simulation study demonstrates that even considerable heterogeneity in these effects does not lead to bias in the estimates.
These methods can be used to estimate direct and indirect causal effects in a mediation setting, and have potential for the investigation of more complex networks between multiple interrelated exposures and disease outcomes.
Abstract
Quantitative trait locus (QTL) mapping of molecular phenotypes such as metabolites, lipids and proteins through genome-wide association studies represents a powerful means of highlighting ...molecular mechanisms relevant to human diseases. However, a major challenge of this approach is to identify the causal gene(s) at the observed QTLs. Here, we present a framework for the 'Prioritization of candidate causal Genes at Molecular QTLs' (ProGeM), which incorporates biological domain-specific annotation data alongside genome annotation data from multiple repositories. We assessed the performance of ProGeM using a reference set of 227 previously reported and extensively curated metabolite QTLs. For 98% of these loci, the expert-curated gene was one of the candidate causal genes prioritized by ProGeM. Benchmarking analyses revealed that 69% of the causal candidates were nearest to the sentinel variant at the investigated molecular QTLs, indicating that genomic proximity is the most reliable indicator of 'true positive' causal genes. In contrast, cis-gene expression QTL data led to three false positive candidate causal gene assignments for every one true positive assignment. We provide evidence that these conclusions also apply to other molecular phenotypes, suggesting that ProGeM is a powerful and versatile tool for annotating molecular QTLs. ProGeM is freely available via GitHub.