Transcriptome-wide association studies using predicted expression have identified thousands of genes whose locally regulated expression is associated with complex traits and diseases. In this work, ...we show that linkage disequilibrium induces significant gene-trait associations at non-causal genes as a function of the expression quantitative trait loci weights used in expression prediction. We introduce a probabilistic framework that models correlation among transcriptome-wide association study signals to assign a probability for every gene in the risk region to explain the observed association signal. Importantly, our approach remains accurate when expression data for causal genes are not available in the causal tissue by leveraging expression prediction from other tissues. Our approach yields credible sets of genes containing the causal gene at a nominal confidence level (for example, 90%) that can be used to prioritize genes for functional assays. We illustrate our approach by using an integrative analysis of lipid traits, where our approach prioritizes genes with strong evidence for causality.
Genome-wide association studies (GWAS) have identified over 100 risk loci for schizophrenia, but the causal mechanisms remain largely unknown. We performed a transcriptome-wide association study ...(TWAS) integrating a schizophrenia GWAS of 79,845 individuals from the Psychiatric Genomics Consortium with expression data from brain, blood, and adipose tissues across 3,693 primarily control individuals. We identified 157 TWAS-significant genes, of which 35 did not overlap a known GWAS locus. Of these 157 genes, 42 were associated with specific chromatin features measured in independent samples, thus highlighting potential regulatory targets for follow-up. Suppression of one identified susceptibility gene, mapk3, in zebrafish showed a significant effect on neurodevelopmental phenotypes. Expression and splicing from the brain captured most of the TWAS effect across all genes. This large-scale connection of associations to target genes, tissues, and regulatory features is an essential step in moving toward a mechanistic understanding of GWAS.
Tissue-specific regulatory regions harbor substantial genetic risk for disease. Because brain development is a critical epoch for neuropsychiatric disease susceptibility, we characterized the genetic ...control of the transcriptome in 201 mid-gestational human brains, identifying 7,962 expression quantitative trait loci (eQTL) and 4,635 spliceQTL (sQTL), including several thousand prenatal-specific regulatory regions. We show that significant genetic liability for neuropsychiatric disease lies within prenatal eQTL and sQTL. Integration of eQTL and sQTL with genome-wide association studies (GWAS) via transcriptome-wide association identified dozens of novel candidate risk genes, highlighting shared and stage-specific mechanisms in schizophrenia (SCZ). Gene network analysis revealed that SCZ and autism spectrum disorder (ASD) affect distinct developmental gene co-expression modules. Yet, in each disorder, common and rare genetic variation converges within modules, which in ASD implicates superficial cortical neurons. More broadly, these data, available as a web browser and our analyses, demonstrate the genetic mechanisms by which developmental events have a widespread influence on adult anatomical and behavioral phenotypes.
Display omitted
•We identify eQTLs and sQTLs specific to human prenatal brain development•Regulation of expression and splicing differs substantially across development•Distinct relationships exist between expression, splicing, and disease risk•Variation in splicing carries as much, if not more, disease risk than expression
An atlas of expression and splice quantitative trait loci from mid-gestational human brain is integrated with genetic risk for schizophrenia, suggesting additional causal genes and highlighting the importance of QTL datasets derived from developmental stages most relevant to disease initiation.
Transcriptome-wide association studies (TWAS) test the association between traits and genetically predicted gene expression levels. The power of a TWAS depends in part on the strength of the ...correlation between a genetic predictor of gene expression and the causally relevant gene expression values. Consequently, TWAS power can be low when expression quantitative trait locus (eQTL) data used to train the genetic predictors have small sample sizes, or when data from causally relevant tissues are not available. Here, we propose to address these issues by integrating multiple tissues in the TWAS using sparse canonical correlation analysis (sCCA). We show that sCCA-TWAS combined with single-tissue TWAS using an aggregate Cauchy association test (ACAT) outperforms traditional single-tissue TWAS. In empirically motivated simulations, the sCCA+ACAT approach yielded the highest power to detect a gene associated with phenotype, even when expression in the causal tissue was not directly measured, while controlling the Type I error when there is no association between gene expression and phenotype. For example, when gene expression explains 2% of the variability in outcome, and the GWAS sample size is 20,000, the average power difference between the ACAT combined test of sCCA features and single-tissue, versus single-tissue combined with Generalized Berk-Jones (GBJ) method, single-tissue combined with S-MultiXcan, UTMOST, or summarizing cross-tissue expression patterns using Principal Component Analysis (PCA) approaches was 5%, 8%, 5% and 38%, respectively. The gain in power is likely due to sCCA cross-tissue features being more likely to be detectably heritable. When applied to publicly available summary statistics from 10 complex traits, the sCCA+ACAT test was able to increase the number of testable genes and identify on average an additional 400 additional gene-trait associations that single-trait TWAS missed. Our results suggest that aggregating eQTL data across multiple tissues using sCCA can improve the sensitivity of TWAS while controlling for the false positive rate.
Parasitic nematodes infect over 1 billion people worldwide and cause some of the most common neglected tropical diseases. Despite their prevalence, our understanding of the biology of parasitic ...nematodes has been limited by the lack of tools for genetic intervention. In particular, it has not yet been possible to generate targeted gene disruptions and mutant phenotypes in any parasitic nematode. Here, we report the development of a method for introducing CRISPR-Cas9-mediated gene disruptions in the human-parasitic threadworm Strongyloides stercoralis. We disrupted the S. stercoralis twitchin gene unc-22, resulting in nematodes with severe motility defects. Ss-unc-22 mutations were resolved by homology-directed repair when a repair template was provided. Omission of a repair template resulted in deletions at the target locus. Ss-unc-22 mutations were heritable; we passed Ss-unc-22 mutants through a host and successfully recovered mutant progeny. Using a similar approach, we also disrupted the unc-22 gene of the rat-parasitic nematode Strongyloides ratti. Our results demonstrate the applicability of CRISPR-Cas9 to parasitic nematodes, and thereby enable future studies of gene function in these medically relevant but previously genetically intractable parasites.
Juvenile idiopathic arthritis (JIA) is one of the most prevalent rheumatic disorders in children and is classified as an autoimmune disease (AID). While a robust genetic contribution to JIA etiology ...has been established, the exact pathogenesis remains unclear.
To prioritize biologically interpretable susceptibility genes and proteins for JIA, we conducted transcriptome-wide and proteome-wide association studies (TWAS/PWAS). Then, to understand the genetic architecture of JIA, we systematically analyzed single-nucleotide polymorphism (SNP)-based heritability, a signature of natural selection, and polygenicity. Next, we conducted HLA typing using multi-ethnicity RNA sequencing data. Additionally, we examined the T cell receptor (TCR) repertoire at a single-cell level to explore the potential links between immunity and JIA risk.
We have identified 19 TWAS genes and two PWAS proteins associated with JIA risks. Furthermore, we observe that the heritability and cell type enrichment analysis of JIA are enriched in T lymphocytes and HLA regions and that JIA shows higher polygenicity compared to other AIDs. In multi-ancestry HLA typing, B*45:01 is more prevalent in African JIA patients than in European JIA patients, whereas DQA1*01:01, DQA1*03:01, and DRB1*04:01 exhibit a higher frequency in European JIA patients. Using single-cell immune repertoire analysis, we identify clonally expanded T cell subpopulations in JIA patients, including CXCL13
BHLHE40
T
cells which are significantly associated with JIA risks.
Our findings shed new light on the pathogenesis of JIA and provide a strong foundation for future mechanistic studies aimed at uncovering the molecular drivers of JIA.
Latent factor models, like principal component analysis (PCA), provide a statistical framework to infer low-rank representation in various biological contexts. However, feature selection is ...challenging when this low-rank structure manifests from a sparse subspace. We introduce SuSiE PCA, a scalable sparse latent factor approach that evaluates uncertainty in contributing variables through posterior inclusion probabilities. We validate our model in extensive simulations and demonstrate that SuSiE PCA outperforms other approaches in signal detection and model robustness. We apply SuSiE PCA to multi-tissue expression quantitative trait loci (eQTLs) data from GTEx v8 and identify tissue-specific factors and their contributing eGenes. We further investigate its performance on the large-scale perturbation data and find that SuSiE PCA identifies modules with a higher enrichment of ribosome-related genes than sparse PCA (false discovery rate FDR =9.2×10−82 vs. 1.4×10−33), while being ∼ 18x faster. Overall, SuSiE PCA provides an efficient tool to identify relevant features in high-dimensional biological data.
Display omitted
•Efficient PCA feature selection via posterior inclusion probabilities•Learned model prior enhances inferential robustness•Seamless CPU/GPU/TPU implementation enables efficient inference
Biocomputational method; Classification of bioinformatical subject; Data processing in systems biology; Algorithms
Expression Quantitative Trait Loci (eQTLs) are critical to understanding the mechanisms underlying disease-associated genomic loci. Nearly all protein-coding genes in the human genome have been ...associated with one or more eQTLs. Here we introduce a multi-variant generalization of allelic Fold Change (aFC), aFC-n, to enable quantification of the cis-regulatory effects in multi-eQTL genes under the assumption that all eQTLs are known and conditionally independent. Applying aFC-n to 458,465 eQTLs in the Genotype-Tissue Expression (GTEx) project data, we demonstrate significant improvements in accuracy over the original model in estimating the eQTL effect sizes and in predicting genetically regulated gene expression over the current tools. We characterize some of the empirical properties of the eQTL data and use this framework to assess the current state of eQTL data in terms of characterizing cis-regulatory landscape in individual genomes. Notably, we show that 77.4% of the genes with an allelic imbalance in a sample show 0.5 log
fold or more of residual imbalance after accounting for the eQTL data underlining the remaining gap in characterizing regulatory landscape in individual genomes. We further contrast this gap across tissue types, and ancestry backgrounds to identify its correlates and guide future studies.
Although genome-wide association studies (GWAS) for prostate cancer (PrCa) have identified more than 100 risk regions, most of the risk genes at these regions remain largely unknown. Here we ...integrate the largest PrCa GWAS (N = 142,392) with gene expression measured in 45 tissues (N = 4458), including normal and tumor prostate, to perform a multi-tissue transcriptome-wide association study (TWAS) for PrCa. We identify 217 genes at 84 independent 1 Mb regions associated with PrCa risk, 9 of which are regions with no genome-wide significant SNP within 2 Mb. 23 genes are significant in TWAS only for alternative splicing models in prostate tumor thus supporting the hypothesis of splicing driving risk for continued oncogenesis. Finally, we use a Bayesian probabilistic approach to estimate credible sets of genes containing the causal gene at a pre-defined level; this reduced the list of 217 associations to 109 genes in the 90% credible set. Overall, our findings highlight the power of integrating expression with PrCa GWAS to identify novel risk loci and prioritize putative causal genes at known risk loci.