The proteome holds great potential as an intermediate layer between the genome and phenome. Previous protein quantitative trait locus studies have focused mainly on describing the effects of common ...genetic variations on the proteome. Here, we assessed the impact of the common and rare genetic variations as well as the copy number variants (CNVs) on 326 plasma proteins measured in up to 500 individuals. We identified 184 cis and 94 trans signals for 157 protein traits, which were further fine-mapped to credible sets for 101 cis and 87 trans signals for 151 proteins. Rare genetic variation contributed to the levels of 7 proteins, with 5 cis and 14 trans associations. CNVs were associated with the levels of 11 proteins (7 cis and 5 trans), examples including a 3q12.1 deletion acting as a hub for multiple trans associations; and a CNV overlapping NAIP, a sensor component of the NAIP-NLRC4 inflammasome which is affecting pro-inflammatory cytokine interleukin 18 levels. In summary, this work presents a comprehensive resource of genetic variation affecting the plasma protein levels and provides the interpretation of identified effects.
Disruptive, damaging ultra-rare variants in highly constrained genes are enriched in individuals with neurodevelopmental disorders. In the general population, this class of variants was associated ...with a decrease in years of education (YOE). This effect was stronger among highly brain-expressed genes and explained more YOE variance than pathogenic copy number variation but less than common variants. Disruptive, damaging ultra-rare variants in highly constrained genes influence the determinants of YOE in the general population.
The study investigated differences in the Five-Factor Model (FFM) domains and facets across adulthood. The main questions were whether personality scales reflected coherent units of trait development ...and thereby coherent personality traits more generally. These questions were addressed by testing if the components of the trait scales (items for facet scales and facets for domain scales) showed consistent age group differences. For this, measurement invariance (MI) framework was used. In a sample of 2,711 Estonians who had completed the NEO Personality Inventory 3 (NEO PI-3), more than half of the facet scales and one domain scale did not meet the criterion for weak MI (factor loading equality) across 12 age groups spanning ages from 18 to 91 years. Furthermore, none of the facet and domain scales met the criterion for strong MI (intercept equality), suggesting that items of the same facets and facets of the same domains varied in age group differences. When items were residualized for their respective facets, 46% of them had significant (p < 0.0002) residual age-correlations. When facets were residualized for their domain scores, a majority had significant (p < 0.002) residual age-correlations. For each domain, a series of latent factors were specified using random quarters of their items: scores of such latent factors varied notably (within domains) in correlations with age. We argue that manifestations of aetiologically coherent traits should show similar age group differences. Given this, the FFM domains and facets as embodied in the NEO PI-3 do not reflect aetiologically coherent traits.
Large-scale, population-based biobanks integrating health records and genomic profiles may provide a platform to identify individuals with disease-predisposing genetic variants. Here, we recall ...probands carrying familial hypercholesterolemia (FH)-associated variants, perform cascade screening of family members, and describe health outcomes affected by such a strategy.
The Estonian Biobank of Estonian Genome Center, University of Tartu, comprises 52,274 individuals. Among 4776 participants with exome or genome sequences, we identified 27 individuals who carried FH-associated variants in the LDLR, APOB, or PCSK9 genes. Cascade screening of 64 family members identified an additional 20 carriers of FH-associated variants.
Via genetic counseling and clinical management of carriers, we were able to reclassify 51% of the study participants from having previously established nonspecific hypercholesterolemia to having FH and identify 32% who were completely unaware of harboring a high-risk disease-associated genetic variant. Imaging-based risk stratification targeted 86% of the variant carriers for statin treatment recommendations.
Genotype-guided recall of probands and subsequent cascade screening for familial hypercholesterolemia is feasible within a population-based biobank and may facilitate more appropriate clinical management.
Hernias are characterized by protrusion of an organ or tissue through its surrounding cavity and often require surgical repair. In this study we identify 65,492 cases for five hernia types in the UK ...Biobank and perform genome-wide association study scans for these five types and two combined groups. Our results show associated variants in all scans. Inguinal hernia has the most associations and we conduct a follow-up study with 23,803 additional cases from four study groups giving 84 independently associated variants. Identified variants from all scans are collapsed into 81 independent loci. Further testing shows that 26 loci are associated with more than one hernia type, suggesting substantial overlap between the underlying genetic mechanisms. Pathway analyses identify several genes with a strong link to collagen and/or elastin (ADAMTS6, ADAMTS16, ADAMTSL3, LOX, ELN) in the vicinity of associated loci for inguinal hernia, which substantiates an essential role of connective tissue morphology.
Pharmacogenomics aims to tailor pharmacological treatment to each individual by considering associations between genetic polymorphisms and adverse drug effects (ADEs). With technological advances, ...pharmacogenomic research has evolved from candidate gene analyses to genome-wide association studies. Here, we integrate deep whole-genome sequencing (WGS) information with drug prescription and ADE data from Estonian electronic health record (EHR) databases to evaluate genome- and pharmacome-wide associations on an unprecedented scale. We leveraged WGS data of 2240 Estonian Biobank participants and imputed all single-nucleotide variants (SNVs) with allele counts over 2 for 13,986 genotyped participants. Overall, we identified 41 (10 novel) loss-of-function and 567 (134 novel) missense variants in 64 very important pharmacogenes. The majority of the detected variants were very rare with frequencies below 0.05%, and 6 of the novel loss-of-function and 99 of the missense variants were only detected as single alleles (allele count = 1). We also validated documented pharmacogenetic associations and detected new independent variants in known gene-drug pairs. Specifically, we found that CTNNA3 was associated with myositis and myopathies among individuals taking nonsteroidal anti-inflammatory oxicams and replicated this finding in an extended cohort of 706 individuals. These findings illustrate that population-based WGS-coupled EHRs are a useful tool for biomarker discovery.
A recent genome-wide-association study of educational attainment identified three single-nucleotide polymorphisms (SNPs) whose associations, despite their small effect sizes (each R² ≈ 0.02%), ...reached genome-wide significance (p < 5 × 10⁻⁸) in a large discovery sample and were replicated in an independent sample (p < .05). The study also reported associations between educational attainment and indices of SNPs called "polygenic scores." In three studies, we evaluated the robustness of these findings. Study 1 showed that the associations with all three SNPs were replicated in another large (N = 34,428) independent sample. We also found that the scores remained predictive (R² ≈ 2%) in regressions with stringent controls for stratification (Study 2) and in new within-family analyses (Study 3). Our results show that large and therefore well-powered genome-wide-association studies can identify replicable genetic associations with behavioral traits. The small effect sizes of individual SNPs are likely to be a major contributing factor explaining the striking contrast between our results and the disappointing replication record of most candidate-gene studies.
Allele-specific gene expression associated with genetic variation in regulatory regions can play an important role in the development of complex traits. We hypothesized that polymorphisms in microRNA ...(miRNA) response elements (MRE-SNPs) that either disrupt a miRNA binding site or create a new miRNA binding site can affect the allele-specific expression of target genes. By integrating public expression quantitative trait locus (eQTL) data, miRNA binding site predictions, small RNA sequencing, and Argonaute crosslinking immunoprecipitation (AGO-CLIP) datasets, we identified genetic variants that can affect gene expression by modulating miRNA binding efficiency. We also identified MRE-SNPs located in regions associated with complex traits, indicating possible causative mechanisms associated with these loci. The results of this study expand the current understanding of gene expression regulation and help to interpret the mechanisms underlying eQTL effects.
The functional consequences of trait associated SNPs are often investigated using expression quantitative trait locus (eQTL) mapping. While trait-associated variants may operate in a cell-type ...specific manner, eQTL datasets for such cell-types may not always be available. We performed a genome-environment interaction (GxE) meta-analysis on data from 5,683 samples to infer the cell type specificity of whole blood cis-eQTLs. We demonstrate that this method is able to predict neutrophil and lymphocyte specific cis-eQTLs and replicate these predictions in independent cell-type specific datasets. Finally, we show that SNPs associated with Crohn's disease preferentially affect gene expression within neutrophils, including the archetypal NOD2 locus.
Diet is considered as one of the most important modifiable factors influencing human health, but efforts to identify foods or dietary patterns associated with health outcomes often suffer from ...biases, confounding, and reverse causation. Applying Mendelian randomization in this context may provide evidence to strengthen causality in nutrition research. To this end, we first identified 283 genetic markers associated with dietary intake in 445,779 UK Biobank participants. We then converted these associations into direct genetic effects on food exposures by adjusting them for effects mediated via other traits. The SNPs which did not show evidence of mediation were then used for MR, assessing the association between genetically predicted food choices and other risk factors, health outcomes. We show that using all associated SNPs without omitting those which show evidence of mediation, leads to biases in downstream analyses (genetic correlations, causal inference), similar to those present in observational studies. However, MR analyses using SNPs which have only a direct effect on the exposure on food exposures provided unequivocal evidence of causal associations between specific eating patterns and obesity, blood lipid status, and several other risk factors and health outcomes.