In order to discover quantitative trait loci, multi-dimensional genomic datasets combining DNA-seq and ChiP-/RNA-seq require methods that rapidly correlate tens of thousands of molecular phenotypes ...with millions of genetic variants while appropriately controlling for multiple testing.
We have developed FastQTL, a method that implements a popular cis-QTL mapping strategy in a user- and cluster-friendly tool. FastQTL also proposes an efficient permutation procedure to control for multiple testing. The outcome of permutations is modeled using beta distributions trained from a few permutations and from which adjusted P-values can be estimated at any level of significance with little computational cost. The Geuvadis & GTEx pilot datasets can be now easily analyzed an order of magnitude faster than previous approaches.
Source code, binaries and comprehensive documentation of FastQTL are freely available to download at http://fastqtl.sourceforge.net/
emmanouil.dermitzakis@unige.ch or olivier.delaneau@unige.ch
Supplementary data are available at Bioinformatics online.
Population scale studies combining genetic information with molecular phenotypes (for example, gene expression) have become a standard to dissect the effects of genetic variants onto organismal ...phenotypes. These kinds of data sets require powerful, fast and versatile methods able to discover molecular Quantitative Trait Loci (molQTL). Here we propose such a solution, QTLtools, a modular framework that contains multiple new and well-established methods to prepare the data, to discover proximal and distal molQTLs and, finally, to integrate them with GWAS variants and functional annotations of the genome. We demonstrate its utility by performing a complete expression QTL study in a few easy-to-perform steps. QTLtools is open source and available at https://qtltools.github.io/qtltools/.
We apply integrative approaches to expression quantitative loci (eQTLs) from 44 tissues from the Genotype-Tissue Expression project and genome-wide association study data. About 60% of known ...trait-associated loci are in linkage disequilibrium with a cis-eQTL, over half of which were not found in previous large-scale whole blood studies. Applying polygenic analyses to metabolic, cardiovascular, anthropometric, autoimmune, and neurodegenerative traits, we find that eQTLs are significantly enriched for trait associations in relevant pathogenic tissues and explain a substantial proportion of the heritability (40-80%). For most traits, tissue-shared eQTLs underlie a greater proportion of trait associations, although tissue-specific eQTLs have a greater contribution to some traits, such as blood pressure. By integrating information from biological pathways with eQTL target genes and applying a gene-based approach, we validate previously implicated causal genes and pathways, and propose new variant and gene associations for several complex traits, which we replicate in the UK BioBank and BioVU.
How to interpret the biological causes underlying the predisposing markers identified through genome-wide association studies (GWAS) remains an open question. One direct and powerful way to assess ...the genetic causality behind GWAS is through analysis of expression quantitative trait loci (eQTLs). Here we describe a new approach to estimate the tissues behind the genetic causality of a variety of GWAS traits, using the cis-eQTLs in 44 tissues from the Genotype-Tissue Expression (GTEx) Consortium. We have adapted the regulatory trait concordance (RTC) score to measure the probability of eQTLs being active in multiple tissues and to calculate the probability that a GWAS-associated variant and an eQTL tag the same functional effect. By normalizing the GWAS-eQTL probabilities by the tissue-sharing estimates for eQTLs, we generate relative tissue-causality profiles for GWAS traits. Our approach not only implicates the gene likely mediating individual GWAS signals, but also highlights tissues where the genetic causality for an individual trait is likely manifested.
The Human Leukocyte Antigen (HLA) is a critical genetic system for different outcomes after solid organ and hematopoietic cell transplantation. Its polymorphism is usually determined by molecular ...technologies at the DNA level. A potential role of HLA allelic expression remains under investigation in the context of the allogenic immune response between donors and recipients. In this study, we quantified the allelic expression of all three HLA class I loci (HLA-A, B and C) by RNA sequencing and conducted an analysis of expression quantitative traits loci (eQTL) to investigate whether HLA expression regulation could be associated with non-coding gene variations. HLA-B alleles exhibited the highest expression levels followed by HLA-C and HLA-A alleles. The max fold expression variation was observed for HLA-C alleles. The expression of HLA class I loci of distinct individuals demonstrated a coordinated and paired expression of both alleles of the same locus. Expression of conserved HLA-A~B~C haplotypes differed in distinct PBMC's suggesting an individual regulated expression of both HLA class I alleles and haplotypes. Cytokines TNFα /IFNβ, which induced a very similar upregulation of HLA class I RNA and cell surface expression across alleles did not modify the individually coordinated expression at the three HLA class I loci. By identifying cis eQTLs for the HLA class I genes, we show that the non-coding eQTLs explain 29%, 13%, and 31% of the respective HLA-A, B, C expression variance in unstimulated cells, and 9%, 23%, and 50% of the variance in cytokine-stimulated cells. The eQTLs have significantly higher effect sizes in stimulated cells compared to unstimulated cells for HLA-B and HLA-C genes expression. Our data also suggest that the identified eQTLs are independent from the coding variation which defines HLA alleles and thus may be influential on intra-allele expression variability although they might not represent the causal eQTLs.
With the advent of RNA-sequencing technology, we can detect different types of alternative splicing and determine how DNA variation regulates splicing. However, given the short read lengths used in ...most population-based RNA-sequencing experiments, quantifying transcripts accurately remains a challenge. Here we present a method, Altrans, for discovery of alternative splicing quantitative trait loci (asQTLs). To assess the performance of Altrans, we compared it to Cufflinks and MISO in simulations and Cufflinks for asQTL discovery. Simulations show that in the presence of unannotated transcripts, Altrans performs better in quantifications than Cufflinks and MISO. We have applied Altrans and Cufflinks to the Geuvadis dataset, which comprises samples from European and African populations, and discovered (FDR = 1%) 1,427 and 166 asQTLs with Altrans and 1,737 and 304 asQTLs with Cufflinks for Europeans and Africans, respectively. We show that, by discovering a set of asQTLs in a smaller subset of European samples and replicating these in the remaining larger subset of Europeans, both methods achieve similar replication levels (95% for both methods). We find many Altrans-specific asQTLs, which replicate to a high degree (93%). This is mainly due to junctions absent from the annotations and hence not tested with Cufflinks. The asQTLs are significantly enriched for biochemically active regions of the genome, functional marks, and variants in splicing regions, highlighting their biological relevance. We present an approach for discovering asQTLs that is a more direct assessment of splicing compared to other methods and is complementary to other transcript quantification methods.
DNA methylation is an essential epigenetic mark whose role in gene regulation and its dependency on genomic sequence and environment are not fully understood. In this study we provide novel insights ...into the mechanistic relationships between genetic variation, DNA methylation and transcriptome sequencing data in three different cell-types of the GenCord human population cohort. We find that the association between DNA methylation and gene expression variation among individuals are likely due to different mechanisms from those establishing methylation-expression patterns during differentiation. Furthermore, cell-type differential DNA methylation may delineate a platform in which local inter-individual changes may respond to or act in gene regulation. We show that unlike genetic regulatory variation, DNA methylation alone does not significantly drive allele specific expression. Finally, inferred mechanistic relationships using genetic variation as well as correlations with TF abundance reveal both a passive and active role of DNA methylation to regulatory interactions influencing gene expression. DOI:http://dx.doi.org/10.7554/eLife.00523.001.
Elucidating the pathophysiology and molecular attributes of common disorders as well as developing targeted and effective treatments hinges on the study of the relevant cell type and tissues. ...Pancreatic beta cells within the islets of Langerhans are centrally involved in the pathogenesis of both type 1 and type 2 diabetes. Describing the differentiated state of the human beta cell has been hampered so far by technical (low resolution microarrays) and biological limitations (whole islet preparations rather than isolated beta cells). We circumvent these by deep RNA sequencing of purified beta cells from 11 individuals, presenting here the first characterization of the human beta cell transcriptome. We perform the first comparison of gene expression profiles between beta cells, whole islets, and beta cell depleted islet preparations, revealing thus beta-cell-specific expression and splicing signatures. Further, we demonstrate that genes with consistent increased expression in beta cells have neuronal-like properties, a signal previously hypothesized. Finally, we find evidence for extensive allelic imbalance in expression and uncover genetic regulatory variants (eQTLs) active in beta cells. This first molecular blueprint of the human beta cell offers biological insight into its differentiated function, including expression of key genes associated with both major types of diabetes.
Transposable elements (TEs) are prevalent repeats in the human genome, play a significant role in the regulome, and their disruption can contribute to tumorigenesis. However, TE influence on gene ...expression in cancer remains unclear. Here, we analyze 275 normal colon and 276 colorectal cancer samples from the SYSCOL cohort, discovering 10,231 and 5,199 TE-expression quantitative trait loci (eQTLs) in normal and tumor tissues, respectively, of which 376 are colorectal cancer specific eQTLs, likely due to methylation changes. Tumor-specific TE-eQTLs show greater enrichment of transcription factors, compared to shared TE-eQTLs suggesting specific regulation of their expression in tumor. Bayesian networks reveal 1,766 TEs as mediators of genetic effects, altering the expression of 1,558 genes, including 55 known cancer driver genes and show that tumor-specific TE-eQTLs trigger the driver capability of TEs. These insights expand our knowledge of cancer drivers, deepening our understanding of tumorigenesis and presenting potential avenues for therapeutic interventions.
Understanding how genetic variation affects distinct cellular phenotypes, such as gene expression levels, alternative splicing and DNA methylation levels, is essential for better understanding of ...complex diseases and traits. Furthermore, how inter-individual variation of DNA methylation is associated to gene expression is just starting to be studied. In this study, we use the GenCord cohort of 204 newborn Europeans' lymphoblastoid cell lines, T-cells and fibroblasts derived from umbilical cords. The samples were previously genotyped for 2.5 million SNPs, mRNA-sequenced, and assayed for methylation levels in 482,421 CpG sites. We observe that methylation sites associated to expression levels are enriched in enhancers, gene bodies and CpG island shores. We show that while the correlation between DNA methylation and gene expression can be positive or negative, it is very consistent across cell-types. However, this epigenetic association to gene expression appears more tissue-specific than the genetic effects on gene expression or DNA methylation (observed in both sharing estimations based on P-values and effect size correlations between cell-types). This predominance of genetic effects can also be reflected by the observation that allele specific expression differences between individuals dominate over tissue-specific effects. Additionally, we discover genetic effects on alternative splicing and interestingly, a large amount of DNA methylation correlating to alternative splicing, both in a tissue-specific manner. The locations of the SNPs and methylation sites involved in these associations highlight the participation of promoter proximal and distant regulatory regions on alternative splicing. Overall, our results provide high-resolution analyses showing how genome sequence variation has a broad effect on cellular phenotypes across cell-types, whereas epigenetic factors provide a secondary layer of variation that is more tissue-specific. Furthermore, the details of how this tissue-specificity may vary across inter-relations of molecular traits, and where these are occurring, can yield further insights into gene regulation and cellular biology as a whole.