Despite the increasing global burden of neurological disorders, there is a lack of effective diagnostic and therapeutic biomarkers. Proteins are often dysregulated in disease and have a strong ...genetic component. Here, we carry out a protein quantitative trait locus analysis of 184 neurologically-relevant proteins, using whole genome sequencing data from two isolated population-based cohorts (N = 2893). In doing so, we elucidate the genetic landscape of the circulating proteome and its connection to neurological disorders. We detect 214 independently-associated variants for 107 proteins, the majority of which (76%) are cis-acting, including 114 variants that have not been previously identified. Using two-sample Mendelian randomisation, we identify causal associations between serum CD33 and Alzheimer's disease, GPNMB and Parkinson's disease, and MSR1 and schizophrenia, describing their clinical potential and highlighting drug repurposing opportunities.
The human proteome is a crucial intermediate between complex diseases and their genetic and environmental components, and an important source of drug development targets and biomarkers. Here, we ...comprehensively assess the genetic architecture of 257 circulating protein biomarkers of cardiometabolic relevance through high-depth (22.5×) whole-genome sequencing (WGS) in 1328 individuals. We discover 131 independent sequence variant associations (P < 7.45 × 10
) across the allele frequency spectrum, all of which replicate in an independent cohort (n = 1605, 18.4x WGS). We identify for the first time replicating evidence for rare-variant cis-acting protein quantitative trait loci for five genes, involving both coding and noncoding variation. We construct and validate polygenic scores that explain up to 45% of protein level variation. We find causal links between protein levels and disease risk, identifying high-value biomarkers and drug development targets.
Next-generation association studies can be empowered by sequence-based imputation and by studying founder populations. Here we report ∼9.5 million variants from whole-genome sequencing (WGS) of a ...Cretan-isolated population, and show enrichment of rare and low-frequency variants with predicted functional consequences. We use a WGS-based imputation approach utilizing 10,422 reference haplotypes to perform genome-wide association analyses and observe 17 genome-wide significant, independent signals, including replicating evidence for association at eight novel low-frequency variant signals. Two novel cardiometabolic associations are at lead variants unique to the founder population sequences: chr16:70790626 (high-density lipoprotein levels beta -1.71 (SE 0.25), P=1.57 × 10
, effect allele frequency (EAF) 0.006); and rs145556679 (triglycerides levels beta -1.13 (SE 0.17), P=2.53 × 10
, EAF 0.013). Our findings add empirical support to the contribution of low-frequency variants in complex traits, demonstrate the advantage of including population-specific sequences in imputation panels and exemplify the power gains afforded by population isolates.
Most genome-wide association studies are based on samples of European descent. We assess whether the genetic determinants of blood lipids, a major cardiovascular risk factor, are shared across ...populations. Genetic correlations for lipids between European-ancestry and Asian cohorts are not significantly different from 1. A genetic risk score based on LDL-cholesterol-associated loci has consistent effects on serum levels in samples from the UK, Uganda and Greece (r = 0.23-0.28, p < 1.9 × 10
). Overall, there is evidence of reproducibility for ~75% of the major lipid loci from European discovery studies, except triglyceride loci in the Ugandan samples (10% of loci). Individual transferable loci are identified using trans-ethnic colocalization. Ten of fourteen loci not transferable to the Ugandan population have pleiotropic associations with BMI in Europeans; none of the transferable loci do. The non-transferable loci might affect lipids by modifying food intake in environments rich in certain nutrients, which suggests a potential role for gene-environment interactions.
Global cardiometabolic disease prevalence has grown rapidly over the years, making it the leading cause of death worldwide. Proteins are crucial components in biological pathways dysregulated in ...disease states. Identifying genetic components that influence circulating protein levels may lead to the discovery of biomarkers for early stages of disease or offer opportunities as therapeutic targets.
Here, we carry out a genome-wide association study (GWAS) utilising whole genome sequencing data in 3,005 individuals from the HELIC founder populations cohort, across 92 proteins of cardiometabolic relevance.
We report 322 protein quantitative trait loci (pQTL) signals across 92 proteins, of which 76 are located in or near the coding gene (cis-pQTL). We link those association signals with changes in protein expression and cardiometabolic disease risk using colocalisation and Mendelian randomisation (MR) analyses.
The majority of previously unknown signals we describe point to proteins or protein interactions involved in inflammation and immune response, providing genetic evidence for the contributing role of inflammation in cardiometabolic disease processes.
• Whole-genome sequencing gives access to insights into new protein biology.• Trans-pQTL resolved using eQTL confirm experimental interactions (e.g. CCL5/ACKR1).• Genetic associations linking cardiometabolic proteins reflect inflammation pathways.
Haematological traits are linked to cardiovascular, metabolic, infectious and immune disorders, as well as cancer. Here, we examine the role of genetic variation in shaping haematological traits in ...two isolated Mediterranean populations. Using whole-genome sequencing data at 22× depth for 1457 individuals from Crete (MANOLIS) and 1617 from the Pomak villages in Greece, we carry out a genome-wide association scan for haematological traits using linear mixed models. We discover novel associations (p < 5 × 10
) of five rare non-coding variants with alleles conferring effects of 1.44-2.63 units of standard deviation on red and white blood cell count, platelet and red cell distribution width. Moreover, 10.0% of individuals in the Pomak population and 6.8% in MANOLIS carry a pathogenic mutation in the Haemoglobin Subunit Beta (HBB) gene. The mutational spectrum is highly diverse (10 different mutations). The most frequent mutation in MANOLIS is the common Mediterranean variant IVS-I-110 (G>A) (rs35004220). In the Pomak population, c.364C>A ("HbO-Arab", rs33946267) is most frequent (4.4% allele frequency). We demonstrate effects on haematological and other traits, including bilirubin, cholesterol, and, in MANOLIS, height and gestation age. We find less severe effects on red blood cell traits for HbS, HbO, and IVS-I-6 (T>C) compared to other b+ mutations. Overall, we uncover allelic diversity of HBB in Greek isolated populations and find an important role for additional rare variants outside of HBB.
The role of rare variants in complex traits remains uncharted. Here, we conduct deep whole genome sequencing of 1457 individuals from an isolated population, and test for rare variant burdens across ...six cardiometabolic traits. We identify a role for rare regulatory variation, which has hitherto been missed. We find evidence of rare variant burdens that are independent of established common variant signals (ADIPOQ and adiponectin, P = 4.2 × 10
; APOC3 and triglyceride levels, P = 1.5 × 10
), and identify replicating evidence for a burden associated with triglyceride levels in FAM189B (P = 2.2 × 10
), indicating a role for this gene in lipid metabolism.
Isolated populations can empower the identification of rare variation associated with complex traits through next generation association studies, but the generalizability of such findings remains ...unknown. Here we genotype 1,267 individuals from a Greek population isolate on the Illumina HumanExome Beadchip, in search of functional coding variants associated with lipids traits. We find genome-wide significant evidence for association between R19X, a functional variant in APOC3, with increased high-density lipoprotein and decreased triglycerides levels. Approximately 3.8% of individuals are heterozygous for this cardioprotective variant, which was previously thought to be private to the Amish founder population. R19X is rare (<0.05% frequency) in outbred European populations. The increased frequency of R19X enables discovery of this lipid traits signal at genome-wide significance in a small sample size. This work exemplifies the value of isolated populations in successfully detecting transferable rare variant associations of high medical relevance.
Deep sequencing offers unparalleled access to rare variants in human populations. Understanding their role in disease is a priority, yet prohibitive sequencing costs mean that many cohorts lack the ...sample size to discover these effects on their own. Meta-analysis of individual variant scores allows the combination of rare variants across cohorts and study of their aggregated effect at the gene level, boosting discovery power. However, the methods involved have largely not been field-tested. In this study, we aim to perform the first meta-analysis of gene-based rare variant aggregation optimal tests, applied to the human cardiometabolic proteome.
Here, we carry out this analysis across MANOLIS, Pomak and ORCADES, three isolated European cohorts with whole-genome sequencing (total N = 4,422). We examine the genetic architecture of 250 proteomic traits of cardiometabolic relevance. We use a containerised pipeline to harmonise variant lists across cohorts and define four sets of qualifying variants. For every gene, we interrogate protein-damaging variants, exonic variants, exonic and regulatory variants, and regulatory only variants, using the CADD and Eigen scores to weigh variants according to their predicted functional consequence. We perform single-cohort rare variant analysis and meta-analyse variant scores using the SMMAT package.
We describe 5 rare variant pQTLs (RV-pQTL) which pass our stringent significance threshold (7.45 × 10−11) and quality control procedure. These were split between four cis signals for MARCO, TEK, MMP2 and MPO, and one trans association for GDF2 in the SERPINA11 gene. We show that the cis-MPO association, which was not detectable using the single-point data alone, is driven by 5 missense and frameshift variants. These include rs140636390 and rs119468010, which are specific to MANOLIS and ORCADES, respectively. We show how this kind of signal could improve the predictive accuracy of genetic factors in common complex disease such as stroke and cardiovascular disease.
Our proof-of-concept study demonstrates the power of gene-based meta-analyses for discovering disease-relevant associations complementing common-variant signals by incorporating population-specific rare variation.
•Meta-analysis of gene-based variant scores leverages population-specific allelic heterogeneity.•We study rare variants influencing 250 circulating serum proteins in two Greek population isolates.•4 cis and 1 trans coding variant signals are study-wide significant and pass quality control.•Five rare coding variants influence protein expression of myeloperoxidase (MPO).•Different isolates are enriched for different rare coding MPO variants.
The original version of this Article contained an error in Fig. 2. In panel a, the two legend items "rare" and "common" were inadvertently swapped. This has been corrected in both the PDF and HTML ...versions of the Article.