PhenoScanner is a curated database of publicly available results from large-scale genetic association studies. This tool aims to facilitate 'phenome scans', the cross-referencing of genetic variants ...with many phenotypes, to help aid understanding of disease pathways and biology. The database currently contains over 350 million association results and over 10 million unique genetic variants, mostly single nucleotide polymorphisms. It is accompanied by a web-based tool that queries the database for associations with user-specified variants, providing results according to the same effect and non-effect alleles for each input variant. The tool provides the option of searching for trait associations with proxies of the input variants, calculated using the European samples from 1000 Genomes and Hapmap.
PhenoScanner is available at www.phenoscanner.medschl.cam.ac.uk CONTACT: jrs95@medschl.cam.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.
The human proteome is a major source of therapeutic targets. Recent genetic association analyses of the plasma proteome enable systematic evaluation of the causal consequences of variation in plasma ...protein levels. Here we estimated the effects of 1,002 proteins on 225 phenotypes using two-sample Mendelian randomization (MR) and colocalization. Of 413 associations supported by evidence from MR, 130 (31.5%) were not supported by results of colocalization analyses, suggesting that genetic confounding due to linkage disequilibrium is widespread in naïve phenome-wide association studies of proteins. Combining MR and colocalization evidence in cis-only analyses, we identified 111 putatively causal effects between 65 proteins and 52 disease-related phenotypes ( https://www.epigraphdb.org/pqtl/ ). Evaluation of data from historic drug development programs showed that target-indication pairs with MR and colocalization support were more likely to be approved, evidencing the value of this approach in identifying and prioritizing potential therapeutic targets.
Genome-wide association studies (GWAS) have identified thousands of genomic regions affecting complex diseases. The next challenge is to elucidate the causal genes and mechanisms involved. One ...approach is to use statistical colocalization to assess shared genetic aetiology across multiple related traits (e.g. molecular traits, metabolic pathways and complex diseases) to identify causal pathways, prioritize causal variants and evaluate pleiotropy. We propose HyPrColoc (Hypothesis Prioritisation for multi-trait Colocalization), an efficient deterministic Bayesian algorithm using GWAS summary statistics that can detect colocalization across vast numbers of traits simultaneously (e.g. 100 traits can be jointly analysed in around 1 s). We perform a genome-wide multi-trait colocalization analysis of coronary heart disease (CHD) and fourteen related traits, identifying 43 regions in which CHD colocalized with ≥1 trait, including 5 previously unknown CHD loci. Across the 43 loci, we further integrate gene and protein expression quantitative trait loci to identify candidate causal genes.
Abstract
Quantitative trait locus (QTL) mapping of molecular phenotypes such as metabolites, lipids and proteins through genome-wide association studies represents a powerful means of highlighting ...molecular mechanisms relevant to human diseases. However, a major challenge of this approach is to identify the causal gene(s) at the observed QTLs. Here, we present a framework for the 'Prioritization of candidate causal Genes at Molecular QTLs' (ProGeM), which incorporates biological domain-specific annotation data alongside genome annotation data from multiple repositories. We assessed the performance of ProGeM using a reference set of 227 previously reported and extensively curated metabolite QTLs. For 98% of these loci, the expert-curated gene was one of the candidate causal genes prioritized by ProGeM. Benchmarking analyses revealed that 69% of the causal candidates were nearest to the sentinel variant at the investigated molecular QTLs, indicating that genomic proximity is the most reliable indicator of 'true positive' causal genes. In contrast, cis-gene expression QTL data led to three false positive candidate causal gene assignments for every one true positive assignment. We provide evidence that these conclusions also apply to other molecular phenotypes, suggesting that ProGeM is a powerful and versatile tool for annotating molecular QTLs. ProGeM is freely available via GitHub.
Identifying genetic variants associated with circulating protein concentrations (protein quantitative trait loci; pQTLs) and integrating them with variants from genome-wide association studies (GWAS) ...may illuminate the proteome's causal role in disease and bridge a knowledge gap regarding SNP-disease associations. We provide the results of GWAS of 71 high-value cardiovascular disease proteins in 6861 Framingham Heart Study participants and independent external replication. We report the mapping of over 16,000 pQTL variants and their functional relevance. We provide an integrated plasma protein-QTL database. Thirteen proteins harbor pQTL variants that match coronary disease-risk variants from GWAS or test causal for coronary disease by Mendelian randomization. Eight of these proteins predict new-onset cardiovascular disease events in Framingham participants. We demonstrate that identifying pQTLs, integrating them with GWAS results, employing Mendelian randomization, and prospectively testing protein-trait associations holds potential for elucidating causal genes, proteins, and pathways for cardiovascular disease and may identify targets for its prevention and treatment.
Advances in proteomic assay technologies have significantly increased coverage and throughput, enabling recent increases in the number of large-scale population-based proteomic studies of human ...plasma and serum. Improvements in multiplexed protein assays have facilitated the quantification of thousands of proteins over a large dynamic range, a key requirement for detecting the lowest-ranging, and potentially the most disease-relevant, blood-circulating proteins. In this perspective, we examine how populational proteomic datasets in conjunction with other concurrent omic measures can be leveraged to better understand the genomic and non-genomic correlates of the soluble proteome, constructing biomarker panels for disease prediction, among others. Mass spectrometry workflows are discussed as they are becoming increasingly competitive with affinity-based array platforms in terms of speed, cost, and proteome coverage due to advances in both instrumentation and workflows. Despite much success, there remain considerable challenges such as orthogonal validation and absolute quantification. We also highlight emergent challenges associated with study design, analytical considerations, and data integration as population-scale studies are run in batches and may involve longitudinal samples collated over many years. Lastly, we take a look at the future of what the nascent next-generation proteomic technologies might provide to the analysis of large sets of blood samples, as well as the difficulties in designing large-scale studies that will likely require participation from multiple and complex funding sources and where data sharing, study designs, and financing must be solved.
Display omitted
•Common uses of large-scale population proteomics in health and disease.•Broad synopsis of main types of scalable proteomic methods.•Overview of large proteomic and proteogenomic studies in human blood.•Summary of key challenges and future outlook of populational proteomics.
Populational cohort-based proteomics assays across thousands of proteins are emerging at the confluence of maturations in technologies and large-scale biobanking efforts. In this perspective, we explore the current state of large-scale proteomic efforts in human blood, their common applications across disease and health phenotypes, as well as applications combining genomics. We also explore the challenges facing proteomics at biobank scale from wide-ranging considerations and provide a future outlook of the next steps the field may head towards.
Genomic atlas of the human plasma proteome Sun, Benjamin B; Maranville, Joseph C; Peters, James E ...
Nature (London),
06/2018, Letnik:
558, Številka:
7708
Journal Article
Recenzirano
Odprti dostop
Although plasma proteins have important roles in biological processes and are the direct targets of many drugs, the genetic factors that control inter-individual variation in plasma protein levels ...are not well understood. Here we characterize the genetic architecture of the human plasma proteome in healthy blood donors from the INTERVAL study. We identify 1,927 genetic associations with 1,478 proteins, a fourfold increase on existing knowledge, including trans associations for 1,104 proteins. To understand the consequences of perturbations in plasma protein levels, we apply an integrated approach that links genetic variation with biological pathway, disease, and drug databases. We show that protein quantitative trait loci overlap with gene expression quantitative trait loci, as well as with disease-associated loci, and find evidence that protein biomarkers have causal roles in disease using Mendelian randomization analysis. By linking genetic factors to diseases via specific proteins, our analyses highlight potential therapeutic targets, opportunities for matching existing drugs with new disease indications, and potential safety concerns for drugs under development.
ABSTRACT
Mendelian randomization uses genetic variants to make causal inferences about the effect of a risk factor on an outcome. With fine‐mapped genetic data, there may be hundreds of genetic ...variants in a single gene region any of which could be used to assess this causal relationship. However, using too many genetic variants in the analysis can lead to spurious estimates and inflated Type 1 error rates. But if only a few genetic variants are used, then the majority of the data is ignored and estimates are highly sensitive to the particular choice of variants. We propose an approach based on summarized data only (genetic association and correlation estimates) that uses principal components analysis to form instruments. This approach has desirable theoretical properties: it takes the totality of data into account and does not suffer from numerical instabilities. It also has good properties in simulation studies: it is not particularly sensitive to varying the genetic variants included in the analysis or the genetic correlation matrix, and it does not have greatly inflated Type 1 error rates. Overall, the method gives estimates that are less precise than those from variable selection approaches (such as using a conditional analysis or pruning approach to select variants), but are more robust to seemingly arbitrary choices in the variable selection step. Methods are illustrated by an example using genetic associations with testosterone for 320 genetic variants to assess the effect of sex hormone related pathways on coronary artery disease risk, in which variable selection approaches give inconsistent inferences.
Genome-wide association studies (GWAS) have identified thousands of genetic variants linked to the risk of human disease. However, GWAS have so far remained largely underpowered in relation to ...identifying associations in the rare and low-frequency allelic spectrum and have lacked the resolution to trace causal mechanisms to underlying genes
. Here we combined whole-exome sequencing in 392,814 UK Biobank participants with imputed genotypes from 260,405 FinnGen participants (653,219 total individuals) to conduct association meta-analyses for 744 disease endpoints across the protein-coding allelic frequency spectrum, bridging the gap between common and rare variant studies. We identified 975 associations, with more than one-third being previously unreported. We demonstrate population-level relevance for mutations previously ascribed to causing single-gene disorders, map GWAS associations to likely causal genes, explain disease mechanisms, and systematically relate disease associations to levels of 117 biomarkers and clinical-stage drug targets. Combining sequencing and genotyping in two population biobanks enabled us to benefit from increased power to detect and explain disease associations, validate findings through replication and propose medical actionability for rare genetic variants. Our study provides a compendium of protein-coding variant associations for future insights into disease biology and drug discovery.
The Pharma Proteomics Project is a precompetitive biopharmaceutical consortium characterizing the plasma proteomic profiles of 54,219 UK Biobank participants. Here we provide a detailed summary of ...this initiative, including technical and biological validations, insights into proteomic disease signatures, and prediction modelling for various demographic and health indicators. We present comprehensive protein quantitative trait locus (pQTL) mapping of 2,923 proteins that identifies 14,287 primary genetic associations, of which 81% are previously undescribed, alongside ancestry-specific pQTL mapping in non-European individuals. The study provides an updated characterization of the genetic architecture of the plasma proteome, contextualized with projected pQTL discovery rates as sample sizes and proteomic assay coverages increase over time. We offer extensive insights into trans pQTLs across multiple biological domains, highlight genetic influences on ligand-receptor interactions and pathway perturbations across a diverse collection of cytokines and complement networks, and illustrate long-range epistatic effects of ABO blood group and FUT2 secretor status on proteins with gastrointestinal tissue-enriched expression. We demonstrate the utility of these data for drug discovery by extending the genetic proxied effects of protein targets, such as PCSK9, on additional endpoints, and disentangle specific genes and proteins perturbed at loci associated with COVID-19 susceptibility. This public-private partnership provides the scientific community with an open-access proteomics resource of considerable breadth and depth to help to elucidate the biological mechanisms underlying proteo-genomic discoveries and accelerate the development of biomarkers, predictive models and therapeutics
.