A genome-wide association study (GWAS) correlates variation in the genotype with variation in the phenotype across a cohort, but the causal gene mediating that impact is often unclear. When the ...phenotype is protein abundance, a reasonable hypothesis is that the gene encoding that protein is the causal gene. However, as variants impacting protein levels can occur thousands or even millions of base pairs from the gene encoding the protein, it is unclear at what distance this simple hypothesis breaks down.
By making the simple assumption that cis-pQTLs should be distance dependent while trans-pQTLs are distance independent, we arrive at a simple and empirical distance cutoff separating cis- and trans-pQTLs. Analyzing a recent large-scale pQTL study (Pietzner in Science 374:eabj1541, 2021) we arrive at an estimated distance cutoff of 944 kilobasepairs (95% confidence interval: 767-1,161) separating the cis and trans regimes.
We demonstrate that this simple model can be applied to other molecular GWAS traits. Since much of biology is built on molecular traits like protein, transcript and metabolite abundance, we posit that the mathematical models for cis and trans distance distributions derived here will also apply to more complex phenotypes and traits.
By sampling 500 time points of an electrocardiogram (ECG) trace in a genome-wide association study (GWAS), Verweig et al. identified novel correlates for cardiac dysfunction. Clustering of SNPs based ...on their impact at all time points revealed natural sets of SNPs with similar effects and mechanisms. Similar approaches may be applicable to other quantitative, time-ordered traits.
Genome-wide association studies (GWASs) have identified many variants associated with complex traits, but identifying the causal gene(s) is a major challenge. In the present study, we present an open ...resource that provides systematic fine mapping and gene prioritization across 133,441 published human GWAS loci. We integrate genetics (GWAS Catalog and UK Biobank) with transcriptomic, proteomic and epigenomic data, including systematic disease-disease and disease-molecular trait colocalization results across 92 cell types and tissues. We identify 729 loci fine mapped to a single-coding causal variant and colocalized with a single gene. We trained a machine-learning model using the fine-mapped genetics and functional genomics data and 445 gold-standard curated GWAS loci to distinguish causal genes from neighboring genes, outperforming a naive distance-based model. Our prioritized genes were enriched for known approved drug targets (odds ratio = 8.1, 95% confidence interval = 5.7, 11.5). These results are publicly available through a web portal ( http://genetics.opentargets.org ), enabling users to easily prioritize genes at disease-associated loci and assess their potential as drug targets.
Abstract
Quantitative trait locus (QTL) mapping of molecular phenotypes such as metabolites, lipids and proteins through genome-wide association studies represents a powerful means of highlighting ...molecular mechanisms relevant to human diseases. However, a major challenge of this approach is to identify the causal gene(s) at the observed QTLs. Here, we present a framework for the 'Prioritization of candidate causal Genes at Molecular QTLs' (ProGeM), which incorporates biological domain-specific annotation data alongside genome annotation data from multiple repositories. We assessed the performance of ProGeM using a reference set of 227 previously reported and extensively curated metabolite QTLs. For 98% of these loci, the expert-curated gene was one of the candidate causal genes prioritized by ProGeM. Benchmarking analyses revealed that 69% of the causal candidates were nearest to the sentinel variant at the investigated molecular QTLs, indicating that genomic proximity is the most reliable indicator of 'true positive' causal genes. In contrast, cis-gene expression QTL data led to three false positive candidate causal gene assignments for every one true positive assignment. We provide evidence that these conclusions also apply to other molecular phenotypes, suggesting that ProGeM is a powerful and versatile tool for annotating molecular QTLs. ProGeM is freely available via GitHub.
Few studies have explored the impact of rare variants (minor allele frequency < 1%) on highly heritable plasma metabolites identified in metabolomic screens. The Finnish population provides an ideal ...opportunity for such explorations, given the multiple bottlenecks and expansions that have shaped its history, and the enrichment for many otherwise rare alleles that has resulted. Here, we report genetic associations for 1391 plasma metabolites in 6136 men from the late-settlement region of Finland. We identify 303 novel association signals, more than one third at variants rare or enriched in Finns. Many of these signals identify genes not previously implicated in metabolite genome-wide association studies and suggest mechanisms for diseases and disease-related traits.
Genome-wide association scans with high-throughput metabolic profiling provide unprecedented insights into how genetic variation influences metabolism and complex disease. Here we report the most ...comprehensive exploration of genetic loci influencing human metabolism thus far, comprising 7,824 adult individuals from 2 European population studies. We report genome-wide significant associations at 145 metabolic loci and their biochemical connectivity with more than 400 metabolites in human blood. We extensively characterize the resulting in vivo blueprint of metabolism in human blood by integrating it with information on gene expression, heritability and overlap with known loci for complex disorders, inborn errors of metabolism and pharmacological targets. We further developed a database and web-based resources for data mining and results visualization. Our findings provide new insights into the role of inherited variation in blood metabolic diversity and identify potential new opportunities for drug development and for understanding disease.
The Pharma Proteomics Project is a precompetitive biopharmaceutical consortium characterizing the plasma proteomic profiles of 54,219 UK Biobank participants. Here we provide a detailed summary of ...this initiative, including technical and biological validations, insights into proteomic disease signatures, and prediction modelling for various demographic and health indicators. We present comprehensive protein quantitative trait locus (pQTL) mapping of 2,923 proteins that identifies 14,287 primary genetic associations, of which 81% are previously undescribed, alongside ancestry-specific pQTL mapping in non-European individuals. The study provides an updated characterization of the genetic architecture of the plasma proteome, contextualized with projected pQTL discovery rates as sample sizes and proteomic assay coverages increase over time. We offer extensive insights into trans pQTLs across multiple biological domains, highlight genetic influences on ligand-receptor interactions and pathway perturbations across a diverse collection of cytokines and complement networks, and illustrate long-range epistatic effects of ABO blood group and FUT2 secretor status on proteins with gastrointestinal tissue-enriched expression. We demonstrate the utility of these data for drug discovery by extending the genetic proxied effects of protein targets, such as PCSK9, on additional endpoints, and disentangle specific genes and proteins perturbed at loci associated with COVID-19 susceptibility. This public-private partnership provides the scientific community with an open-access proteomics resource of considerable breadth and depth to help to elucidate the biological mechanisms underlying proteo-genomic discoveries and accelerate the development of biomarkers, predictive models and therapeutics
.
► Most proteins are not suitable targets for orally bioavailable drugs. ► Known drugs generally exploit binding pockets for endogenous ligands. ► Many druggability methods rely on the shape and ...hydrophobicity of the binding pocket. ► Other methods perform docking of representative small molecules to assess druggability. ► Publicly available training sets have led to the creation of new methods.
A target is druggable if it can be modulated
in vivo by a drug-like molecule. The general properties of oral drugs are summarized by the ‘rule of 5’ which specifies parameters related to size and lipophilicity. Structure-based target druggability assessment consists of predicting ligand-binding sites on the protein that are complementary to these drug-like properties. Automated identification of ligand-binding sites can use geometrical considerations alone or include specific physicochemical properties of the protein surface. Features of a pocket's size and shape, together with measures of its hydrophobicity, are most informative in identifying suitable drug-binding pockets. The recent availability of several validation sets of druggable versus undruggable targets has helped fuel the development of more elaborate methods.
Genome-wide association studies have discovered hundreds of genomic loci associated with psychiatric traits, but the causal genes underlying these associations are often unclear, a research gap that ...has hindered clinical translation. Here, we present a Psychiatric Omnilocus Prioritization Score (PsyOPS) derived from just three binary features encapsulating high-level assumptions about psychiatric disease etiology - namely, that causal psychiatric disease genes are likely to be mutationally constrained, be specifically expressed in the brain, and overlap with known neurodevelopmental disease genes. To our knowledge, PsyOPS is the first method specifically tailored to prioritizing causal genes at psychiatric GWAS loci. We show that, despite its extreme simplicity, PsyOPS achieves state-of-the-art performance at this task, comparable to a prior domain-agnostic approach relying on tens of thousands of features. Genes prioritized by PsyOPS are substantially more likely than other genes at the same loci to have convergent evidence of direct regulation by the GWAS variant according to both DNA looping assays and expression or splicing quantitative trait locus (QTL) maps. We provide examples of genes hundreds of kilobases away from the lead variant, like GABBR1 for schizophrenia, that are prioritized by all three of PsyOPS, DNA looping and QTLs. Our results underscore the power of incorporating high-level knowledge of trait etiology into causal gene prediction at GWAS loci, and comprise a resource for researchers interested in experimentally characterizing psychiatric gene candidates.
Pleiotropy and genetic correlation are widespread features in genome-wide association studies (GWAS), but they are often difficult to interpret at the molecular level. Here, we perform GWAS of 16 ...metabolites clustered at the intersection of amino acid catabolism, glycolysis, and ketone body metabolism in a subset of UK Biobank. We utilize the well-documented biochemistry jointly impacting these metabolites to analyze pleiotropic effects in the context of their pathways. Among the 213 lead GWAS hits, we find a strong enrichment for genes encoding pathway-relevant enzymes and transporters. We demonstrate that the effect directions of variants acting on biology between metabolite pairs often contrast with those of upstream or downstream variants as well as the polygenic background. Thus, we find that these outlier variants often reflect biology local to the traits. Finally, we explore the implications for interpreting disease GWAS, underscoring the potential of unifying biochemistry with dense metabolomics data to understand the molecular basis of pleiotropy in complex traits and diseases.