We analyzed the mRNA levels for 36,778 transcript expression traits (probes) from 2,765 individuals to comprehensively investigate the genetic architecture and degree of missing heritability for gene ...expression in peripheral blood. We identified 11,204 cis and 3,791 trans independent expression quantitative trait loci (eQTL) by using linear mixed models to perform genome-wide association analyses. Furthermore, using information on both closely and distantly related individuals, heritability was estimated for all expression traits. Of the set of expressed probes (15,966), 10,580 (66%) had an estimated narrow-sense heritability (h2) greater than zero with a mean (median) value of 0.192 (0.142). Across these probes, on average the proportion of genetic variance explained by all eQTL (hCOJO2) was 31% (0.060/0.192), meaning that 69% is missing, with the sentinel SNP of the largest eQTL explaining 87% (0.052/0.060) of the variance attributed to all identified cis- and trans-eQTL. For the same set of probes, the genetic variance attributed to genome-wide common (MAF > 0.01) HapMap 3 SNPs (hg2) accounted for on average 48% (0.093/0.192) of h2. Taken together, the evidence suggests that approximately half the genetic variance for gene expression is not tagged by common SNPs, and of the variance that is tagged by common SNPs, a large proportion can be attributed to identifiable eQTL of large effect, typically in cis. Finally, we present evidence that, compared with a meta-analysis, using individual-level data results in an increase of approximately 50% in power to detect eQTL.
Disruptive, damaging ultra-rare variants in highly constrained genes are enriched in individuals with neurodevelopmental disorders. In the general population, this class of variants was associated ...with a decrease in years of education (YOE). This effect was stronger among highly brain-expressed genes and explained more YOE variance than pathogenic copy number variation but less than common variants. Disruptive, damaging ultra-rare variants in highly constrained genes influence the determinants of YOE in the general population.
The study investigated differences in the Five-Factor Model (FFM) domains and facets across adulthood. The main questions were whether personality scales reflected coherent units of trait development ...and thereby coherent personality traits more generally. These questions were addressed by testing if the components of the trait scales (items for facet scales and facets for domain scales) showed consistent age group differences. For this, measurement invariance (MI) framework was used. In a sample of 2,711 Estonians who had completed the NEO Personality Inventory 3 (NEO PI-3), more than half of the facet scales and one domain scale did not meet the criterion for weak MI (factor loading equality) across 12 age groups spanning ages from 18 to 91 years. Furthermore, none of the facet and domain scales met the criterion for strong MI (intercept equality), suggesting that items of the same facets and facets of the same domains varied in age group differences. When items were residualized for their respective facets, 46% of them had significant (p < 0.0002) residual age-correlations. When facets were residualized for their domain scores, a majority had significant (p < 0.002) residual age-correlations. For each domain, a series of latent factors were specified using random quarters of their items: scores of such latent factors varied notably (within domains) in correlations with age. We argue that manifestations of aetiologically coherent traits should show similar age group differences. Given this, the FFM domains and facets as embodied in the NEO PI-3 do not reflect aetiologically coherent traits.
Large-scale, population-based biobanks integrating health records and genomic profiles may provide a platform to identify individuals with disease-predisposing genetic variants. Here, we recall ...probands carrying familial hypercholesterolemia (FH)-associated variants, perform cascade screening of family members, and describe health outcomes affected by such a strategy.
The Estonian Biobank of Estonian Genome Center, University of Tartu, comprises 52,274 individuals. Among 4776 participants with exome or genome sequences, we identified 27 individuals who carried FH-associated variants in the LDLR, APOB, or PCSK9 genes. Cascade screening of 64 family members identified an additional 20 carriers of FH-associated variants.
Via genetic counseling and clinical management of carriers, we were able to reclassify 51% of the study participants from having previously established nonspecific hypercholesterolemia to having FH and identify 32% who were completely unaware of harboring a high-risk disease-associated genetic variant. Imaging-based risk stratification targeted 86% of the variant carriers for statin treatment recommendations.
Genotype-guided recall of probands and subsequent cascade screening for familial hypercholesterolemia is feasible within a population-based biobank and may facilitate more appropriate clinical management.
Most existing TWAS tools require individual-level eQTL reference data and thus are not applicable to summary-level reference eQTL datasets. The development of TWAS methods that can harness ...summary-level reference data is valuable to enable TWAS in broader settings and enhance power due to increased reference sample size. Thus, we develop a TWAS framework called OTTERS (Omnibus Transcriptome Test using Expression Reference Summary data) that adapts multiple polygenic risk score (PRS) methods to estimate eQTL weights from summary-level eQTL reference data and conducts an omnibus TWAS. We show that OTTERS is a practical and powerful TWAS tool by both simulations and application studies.
Genome-wide association studies (GWAS) have identified thousands of variants associated with complex traits, but their biological interpretation often remains unclear. Most of these variants overlap ...with expression QTLs, indicating their potential involvement in regulation of gene expression. Here, we propose a transcriptome-wide summary statistics-based Mendelian Randomization approach (TWMR) that uses multiple SNPs as instruments and multiple gene expression traits as exposures, simultaneously. Applied to 43 human phenotypes, it uncovers 3,913 putatively causal gene-trait associations, 36% of which have no genome-wide significant SNP nearby in previous GWAS. Using independent association summary statistics, we find that the majority of these loci were missed by GWAS due to power issues. Noteworthy among these links is educational attainment-associated BSCL2, known to carry mutations leading to a Mendelian form of encephalopathy. We also find pleiotropic causal effects suggestive of mechanistic connections. TWMR better accounts for pleiotropy and has the potential to identify biological mechanisms underlying complex traits.
Hernias are characterized by protrusion of an organ or tissue through its surrounding cavity and often require surgical repair. In this study we identify 65,492 cases for five hernia types in the UK ...Biobank and perform genome-wide association study scans for these five types and two combined groups. Our results show associated variants in all scans. Inguinal hernia has the most associations and we conduct a follow-up study with 23,803 additional cases from four study groups giving 84 independently associated variants. Identified variants from all scans are collapsed into 81 independent loci. Further testing shows that 26 loci are associated with more than one hernia type, suggesting substantial overlap between the underlying genetic mechanisms. Pathway analyses identify several genes with a strong link to collagen and/or elastin (ADAMTS6, ADAMTS16, ADAMTSL3, LOX, ELN) in the vicinity of associated loci for inguinal hernia, which substantiates an essential role of connective tissue morphology.
Pharmacogenomics aims to tailor pharmacological treatment to each individual by considering associations between genetic polymorphisms and adverse drug effects (ADEs). With technological advances, ...pharmacogenomic research has evolved from candidate gene analyses to genome-wide association studies. Here, we integrate deep whole-genome sequencing (WGS) information with drug prescription and ADE data from Estonian electronic health record (EHR) databases to evaluate genome- and pharmacome-wide associations on an unprecedented scale. We leveraged WGS data of 2240 Estonian Biobank participants and imputed all single-nucleotide variants (SNVs) with allele counts over 2 for 13,986 genotyped participants. Overall, we identified 41 (10 novel) loss-of-function and 567 (134 novel) missense variants in 64 very important pharmacogenes. The majority of the detected variants were very rare with frequencies below 0.05%, and 6 of the novel loss-of-function and 99 of the missense variants were only detected as single alleles (allele count = 1). We also validated documented pharmacogenetic associations and detected new independent variants in known gene-drug pairs. Specifically, we found that CTNNA3 was associated with myositis and myopathies among individuals taking nonsteroidal anti-inflammatory oxicams and replicated this finding in an extended cohort of 706 individuals. These findings illustrate that population-based WGS-coupled EHRs are a useful tool for biomarker discovery.
Using a genome-wide screen of 9.6 million genetic variants achieved through 1000 Genomes Project imputation in 62,166 samples, we identify association to lipid traits in 93 loci, including 79 ...previously identified loci with new lead SNPs and 10 new loci, 15 loci with a low-frequency lead SNP and 10 loci with a missense lead SNP, and 2 loci with an accumulation of rare variants. In six loci, SNPs with established function in lipid genetics (CELSR2, GCKR, LIPC and APOE) or candidate missense mutations with predicted damaging function (CD300LG and TM6SF2) explained the locus associations. The low-frequency variants increased the proportion of variance explained, particularly for low-density lipoprotein cholesterol and total cholesterol. Altogether, our results highlight the impact of low-frequency variants in complex traits and show that imputation offers a cost-effective alternative to resequencing.
Abstract
Background
Understanding the biological differences between sexes in cancer is essential for personalized treatment and prevention. We hypothesized that the extreme downregulation of ...chromosome Y gene expression (EDY) is a signature of cancer risk in men and the functional mediator of the reported association between the mosaic loss of chromosome Y (LOY) and cancer.
Methods
We advanced a method to measure EDY from transcriptomic data. We studied EDY across 47 nondiseased tissues from the Genotype Tissue-Expression Project (n = 371) and its association with cancer status across 12 cancer studies from The Cancer Genome Atlas (n = 1774) and seven other studies (n = 7562). Associations of EDY with cancer status and presence of loss-off function mutations in chromosome X were tested with logistic regression models, and a Fisher’s test was used to assess genome-wide association of EDY with the proportion of copy number gains. All statistical tests were two-sided.
Results
EDY was likely to occur in multiple nondiseased tissues (P < .001) and was statistically significantly associated with the EGFR tyrosine kinase inhibitor resistance pathway (false discovery rate = 0.028). EDY strongly associated with cancer risk in men (odds ratio OR = 3.66, 95% confidence interval CI = 1.58 to 8.46, P = .002), adjusted by LOY and age, and its variability was largely explained by several genes of the nonrecombinant region whose chromosome X homologs showed loss-of-function mutations that co-occurred with EDY during cancer (OR = 2.82, 95% CI = 1.32 to 6.01, P = .007). EDY associated with a high proportion of EGFR amplifications (OR = 5.64, 95% CI = 3.70 to 8.59, false discovery rate < 0.001) and EGFR overexpression along with SRY hypomethylation and nonrecombinant region hypermethylation, indicating alternative causes of EDY in cancer other than LOY. EDY associations were independently validated for different cancers and exposure to smoking, and its status was accurately predicted from individual methylation patterns.
Conclusions
EDY is a male-specific signature of cancer susceptibility that supports the escape from X-inactivation tumor suppressor hypothesis for genes that protect women compared with men from cancer risk.