Many diseases exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting trans-ethnic polygenic risk prediction. We develop a new ...method, S-LDXR, for stratifying squared trans-ethnic genetic correlation across genomic annotations, and apply S-LDXR to genome-wide summary statistics for 31 diseases and complex traits in East Asians (average N = 90K) and Europeans (average N = 267K) with an average trans-ethnic genetic correlation of 0.85. We determine that squared trans-ethnic genetic correlation is 0.82× (s.e. 0.01) depleted in the top quintile of background selection statistic, implying more population-specific causal effect sizes. Accordingly, causal effect sizes are more population-specific in functionally important regions, including conserved and regulatory regions. In regions surrounding specifically expressed genes, causal effect sizes are most population-specific for skin and immune genes, and least population-specific for brain genes. Our results could potentially be explained by stronger gene-environment interaction at loci impacted by selection, particularly positive selection.
Poor trans-ancestry portability of polygenic risk scores is a consequence of Eurocentric genetic studies and limited knowledge of shared causal variants. Leveraging regulatory annotations may improve ...portability by prioritizing functional over tagging variants. We constructed a resource of 707 cell-type-specific IMPACT regulatory annotations by aggregating 5,345 epigenetic datasets to predict binding patterns of 142 transcription factors across 245 cell types. We then partitioned the common SNP heritability of 111 genome-wide association study summary statistics of European (average n ≈ 189,000) and East Asian (average n ≈ 157,000) origin. IMPACT annotations captured consistent SNP heritability between populations, suggesting prioritization of shared functional variants. Variant prioritization using IMPACT resulted in increased trans-ancestry portability of polygenic risk scores from Europeans to East Asians across all 21 phenotypes analyzed (49.9% mean relative increase in R
). Our study identifies a crucial role for functional annotations such as IMPACT to improve the trans-ancestry portability of genetic data.
Summary
Rheumatoid arthritis (RA) risk has a large genetic component (~60%) that is still not fully understood. This has hampered the design of effective treatments that could promise lifelong ...remission. RA is a polygenic disease with 106 known genome‐wide significant associated loci and thousands of small effect causal variants. Our current understanding of RA risk has suggested cell‐type‐specific contexts for causal variants, implicating CD4 + effector memory T cells, as well as monocytes, B cells and stromal fibroblasts. While these cellular states and categories are still mechanistically broad, future studies may identify causal cell subpopulations. These efforts are propelled by advances in single cell profiling. Identification of causal cell subpopulations may accelerate therapeutic intervention to achieve lifelong remission.
Despite significant progress in annotating the genome with experimental methods, much of the regulatory noncoding genome remains poorly defined. Here we assert that regulatory elements may be ...characterized by leveraging local epigenomic signatures where specific transcription factors (TFs) are bound. To link these two features, we introduce IMPACT, a genome annotation strategy that identifies regulatory elements defined by cell-state-specific TF binding profiles, learned from 515 chromatin and sequence annotations. We validate IMPACT using multiple compelling applications. First, IMPACT distinguishes between bound and unbound TF motif sites with high accuracy (average AUPRC 0.81, SE 0.07; across 8 tested TFs) and outperforms state-of-the-art TF binding prediction methods, MocapG, MocapS, and Virtual ChIP-seq. Second, in eight tested cell types, RNA polymerase II IMPACT annotations capture more cis-eQTL variation than sequence-based annotations, such as promoters and TSS windows (25% average increase in enrichment). Third, integration with rheumatoid arthritis (RA) summary statistics from European (N = 38,242) and East Asian (N = 22,515) populations revealed that the top 5% of CD4+ Treg IMPACT regulatory elements capture 85.7% of RA h2, the most comprehensive explanation for RA h2 to date. In comparison, the average RA h2 captured by compared CD4+ T histone marks is 42.3% and by CD4+ T specifically expressed gene sets is 36.4%. Lastly, we find that IMPACT may be used in many different cell types to identify complex trait associated regulatory elements.
The morphology of cells is dynamic and mediated by genetic and environmental factors. Characterizing how genetic variation impacts cell morphology can provide an important link between disease ...association and cellular function. Here, we combine genomic sequencing and high-content imaging approaches on iPSCs from 297 unique donors to investigate the relationship between genetic variants and cellular morphology to map what we term cell morphological quantitative trait loci (cmQTLs). We identify novel associations between rare protein altering variants in WASF2, TSPAN15, and PRLR with several morphological traits related to cell shape, nucleic granularity, and mitochondrial distribution. Knockdown of these genes by CRISPRi confirms their role in cell morphology. Analysis of common variants yields one significant association and nominate over 300 variants with suggestive evidence (P < 10
) of association with one or more morphology traits. We then use these data to make predictions about sample size requirements for increasing discovery in cellular genetic studies. We conclude that, similar to molecular phenotypes, morphological profiling can yield insight about the function of genes and variants.
Of the 1.8 billion people worldwide infected with Mycobacterium tuberculosis, 5-15% will develop active tuberculosis (TB). Approximately half will progress to active TB within the first 18 months ...after infection, presumably because they fail to mount an effective initial immune response. Here, in a genome-wide genetic study of early TB progression, we genotype 4002 active TB cases and their household contacts in Peru. We quantify genetic heritability (Formula: see text) of early TB progression to be 21.2% (standard error 0.08). This suggests TB progression has a strong genetic basis, and is comparable to traits with well-established genetic bases. We identify a novel association between early TB progression and variants located in a putative enhancer region on chromosome 3q23 (rs73226617, OR = 1.18; P = 3.93 × 10
). With in silico and in vitro analyses we identify rs73226617 or rs148722713 as the likely functional variant and ATP1B3 as a potential causal target gene with monocyte specific function.
Cytokines are critical to human disease and are attractive therapeutic targets given their widespread influence on gene regulation and transcription. Defining the downstream regulatory mechanisms ...influenced by cytokines is central to defining drug and disease mechanisms. One promising strategy is to use interactions between expression quantitative trait loci (eQTLs) and cytokine levels to define target genes and mechanisms.
In a clinical trial for anti-IL-6 in patients with systemic lupus erythematosus, we measure interferon (IFN) status, anti-IL-6 drug exposure, and whole blood genome-wide gene expression at three time points. We show that repeat transcriptomic measurements increases the number of cis eQTLs identified compared to using a single time point. We observe a statistically significant enrichment of in vivo eQTL interactions with IFN status and anti-IL-6 drug exposure and find many novel interactions that have not been previously described. Finally, we find transcription factor binding motifs interrupted by eQTL interaction SNPs, which point to key regulatory mediators of these environmental stimuli and therefore potential therapeutic targets for autoimmune diseases. In particular, genes with IFN interactions are enriched for ISRE binding site motifs, while those with anti-IL-6 interactions are enriched for IRF4 motifs.
This study highlights the potential to exploit clinical trial data to discover in vivo eQTL interactions with therapeutically relevant environmental variables.
Significance The anti-HIV drug KP1212 was designed to intentionally increase the mutation rate of HIV, thereby causing viral population collapse. Its mutagenicity and thus antiviral activity was ...proposed to be the result of tautomerization. We used 2D IR spectroscopy to identify rapidly interconverting tautomers under physiological conditions. The traditionally rare enol–imino tautomer for nucleobases was found to be the major species for KP1212, providing a structural support for the tautomer hypothesis. We further found that KP1212 is significantly protonated at physiological pH with a pK ₐ of 7. The protonated KP1212 was shown to be mutagenic, revealing a bimodal mutagenic property of KP1212. The results could prove instrumental in developing the next-generation antiviral treatments.
Antiviral drugs designed to accelerate viral mutation rates can drive a viral population to extinction in a process called lethal mutagenesis. One such molecule is 5,6-dihydro-5-aza-2′-deoxycytidine (KP1212), a selective mutagen that induces A-to-G and G-to-A mutations in the genome of replicating HIV. The mutagenic property of KP1212 was hypothesized to originate from its amino–imino tautomerism, which would explain its ability to base pair with either G or A. To test the multiple tautomer hypothesis, we used 2D IR spectroscopy, which offers subpicosecond time resolution and structural sensitivity to distinguish among rapidly interconverting tautomers. We identified several KP1212 tautomers and found that >60% of neutral KP1212 is present in the enol–imino form. The abundant proportion of this traditionally rare tautomer offers a compelling structure-based mechanism for pairing with adenine. Additionally, the pK ₐ of KP1212 was measured to be 7.0, meaning a substantial population of KP1212 is protonated at physiological pH. Furthermore, the mutagenicity of KP1212 was found to increase dramatically at pH <7, suggesting a significant biological role for the protonated KP1212 molecules. Overall, our data reveal that the bimodal mutagenic properties of KP1212 result from its unique shape shifting ability that utilizes both tautomerization and protonation.
Recent improvements in quantitative proteomics approaches, including Sequential Window Acquisition of all Theoretical Mass Spectra (SWATH-MS), permit reproducible large-scale protein measurements ...across diverse cohorts. Together with genomics, transcriptomics, and other technologies, transomic data sets can be generated that permit detailed analyses across broad molecular interaction networks. Here, we examine mitochondrial links to liver metabolism through the genome, transcriptome, proteome, and metabolome of 386 individuals in the BXD mouse reference population. Several links were validated between genetic variants toward transcripts, proteins, metabolites, and phenotypes. Among these, sequence variants in Cox7a2l alter its protein's activity, which in turn leads to downstream differences in mitochondrial supercomplex formation. This data set demonstrates that the proteome can now be quantified comprehensively, serving as a key complement to transcriptomics, genomics, and metabolomics--a combination moving us forward in complex trait analysis.
Integrative analyses of genome-wide association studies and gene expression data have implicated many disease-critical tissues. However, co-regulation of genetic effects on gene expression across ...tissues impedes distinguishing biologically causal tissues from tagging tissues. In the present study, we introduce tissue co-regulation score regression (TCSC), which disentangles causal tissues from tagging tissues by regressing gene-disease association statistics (from transcriptome-wide association studies) on tissue co-regulation scores, reflecting correlations of predicted gene expression across genes and tissues. We applied TCSC to 78 diseases/traits (average n = 302,000) and gene expression prediction models for 48 GTEx tissues. TCSC identified 21 causal tissue-trait pairs at a 5% false discovery rate (FDR), including well-established findings, biologically plausible new findings (for example, aorta artery and glaucoma) and increased specificity of known tissue-trait associations (for example, subcutaneous adipose, but not visceral adipose, and high-density lipoprotein). TCSC also identified 17 causal tissue-trait covariance pairs at 5% FDR. In conclusion, TCSC is a precise method for distinguishing causal tissues from tagging tissues.