Population scale studies combining genetic information with molecular phenotypes (for example, gene expression) have become a standard to dissect the effects of genetic variants onto organismal ...phenotypes. These kinds of data sets require powerful, fast and versatile methods able to discover molecular Quantitative Trait Loci (molQTL). Here we propose such a solution, QTLtools, a modular framework that contains multiple new and well-established methods to prepare the data, to discover proximal and distal molQTLs and, finally, to integrate them with GWAS variants and functional annotations of the genome. We demonstrate its utility by performing a complete expression QTL study in a few easy-to-perform steps. QTLtools is open source and available at https://qtltools.github.io/qtltools/.
The identification of genetic variants affecting gene expression, namely expression quantitative trait loci (eQTLs), has contributed to the understanding of mechanisms underlying human traits and ...diseases. The majority of these variants map in non-coding regulatory regions of the genome and their identification remains challenging. Here, we use natural genetic variation and CAGE transcriptomes from 154 EBV-transformed lymphoblastoid cell lines, derived from unrelated individuals, to map 5376 and 110 regulatory variants associated with promoter usage (puQTLs) and enhancer activity (eaQTLs), respectively. We characterize five categories of genes associated with puQTLs, distinguishing single from multi-promoter genes. Among multi-promoter genes, we find puQTL effects either specific to a single promoter or to multiple promoters with variable effect orientations. Regulatory variants associated with opposite effects on different mRNA isoforms suggest compensatory mechanisms occurring between alternative promoters. Our analyses identify differential promoter usage and modulation of enhancer activity as molecular mechanisms underlying eQTLs related to regulatory elements.
The study of gene expression in mammalian single cells via genomic technologies now provides the possibility to investigate the patterns of allelic gene expression. We used single-cell RNA sequencing ...to detect the allele-specific mRNA level in 203 single human primary fibroblasts over 133,633 unique heterozygous single-nucleotide variants (hetSNVs). We observed that at the snapshot of analyses, each cell contained mostly transcripts from one allele from the majority of genes; indeed, 76.4% of the hetSNVs displayed stochastic monoallelic expression in single cells. Remarkably, adjacent hetSNVs exhibited a haplotype-consistent allelic ratio; in contrast, distant sites located in two different genes were independent of the haplotype structure. Moreover, the allele-specific expression in single cells correlated with the abundance of the cellular transcript. We observed that genes expressing both alleles in the majority of the single cells at a given time point were rare and enriched with highly expressed genes. The relative abundance of each allele in a cell was controlled by some regulatory mechanisms given that we observed related single-cell allelic profiles according to genes. Overall, these results have direct implications in cellular phenotypic variability.
Large genomic datasets combining genotype and sequence data, such as for expression quantitative trait loci (eQTL) detection, require perfect matching between both data types.
We described here MBV ...(Match BAM to VCF); a method to quickly solve sample mislabeling and detect cross-sample contamination and PCR amplification bias.
MBV is implemented in C ++ as an independent component of the QTLtools software package, the binary and source codes are freely available at https://qtltools.github.io/qtltools/ .
olivier.delaneau@unige.ch or emmanouil.dermitzakis@unige.ch.
Supplementary data are available at Bioinformatics online.
An increasing number of noncoding RNAs (ncRNAs) have been implicated in various human diseases including cancer; however, the ncRNA transcriptome of hepatocellular carcinoma (HCC) is largely ...unexplored. We used CAGE to map transcription start sites across various types of human and mouse HCCs with emphasis on ncRNAs distant from protein-coding genes. Here, we report that retroviral LTR promoters, expressed in healthy tissues such as testis and placenta but not liver, are widely activated in liver tumors. Despite HCC heterogeneity, a subset of LTR-derived ncRNAs were more than 10-fold up-regulated in the vast majority of samples. HCCs with a high LTR activity mostly had a viral etiology, were less differentiated, and showed higher risk of recurrence. ChIP-seq data show that MYC and MAX are associated with ncRNA deregulation. Globally, CAGE enabled us to build a mammalian promoter map for HCC, which uncovers a new layer of complexity in HCC genomics.
Sertoli cells (SCs) are the central, essential coordinators of spermatogenesis, without which germ cell development cannot occur. We previously showed that Dicer, an RNaseIII endonuclease required ...for microRNA (miRNA) biogenesis, is absolutely essential for Sertoli cells to mature, survive, and ultimately sustain germ cell development. Here, using isotope-coded protein labeling, a technique for protein relative quantification by mass spectrometry, we investigated the impact of Sertoli cell-Dicer and subsequent miRNA loss on the testicular proteome. We found that, a large proportion of proteins (50 out of 130) are up-regulated by more that 1.3-fold in testes lacking Sertoli cell-Dicer, yet that this protein up-regulation is mild, never exceeding a 2-fold change, and is not preceeded by alterations of the corresponding mRNAs. Of note, the expression levels of six proteins of interest were further validated using the Absolute Quantification (AQUA) peptide technology. Furthermore, through 3′UTR luciferase assays we identified one up-regulated protein, SOD-1, a Cu/Zn superoxide dismutase whose overexpression has been linked to enhanced cell death through apoptosis, as a likely direct target of three Sertoli cell-expressed miRNAs, miR-125a-3p, miR-872 and miR-24. Altogether, our study, which is one of the few in vivo analyses of miRNA effects on protein output, suggests that, at least in our system, miRNAs play a significant role in translation control.
Identification of functionally relevant differences between induced pluripotent stem cells (iPSC) and reference embryonic stem cells (ESC) remains a central question for therapeutic applications. ...Differences in gene expression between iPSC and ESC have been examined by microarray and more recently with RNA-SEQ technologies. We here report an in depth analyses of nuclear and cytoplasmic transcriptomes, using the CAGE (cap analysis of gene expression) technology, for 5 iPSC clones derived from mouse lymphocytes B and 3 ESC lines. This approach reveals nuclear transcriptomes significantly more complex in ESC than in iPSC. Hundreds of yet not annotated putative non-coding RNAs and enhancer-associated transcripts specifically transcribed in ESC have been detected and supported with epigenetic and chromatin-chromatin interactions data. We identified super-enhancers transcriptionally active specifically in ESC and associated with genes implicated in the maintenance of pluripotency. Similarly, we detected non-coding transcripts of yet unknown function being regulated by ESC specific super-enhancers. Taken together, these results demonstrate that current protocols of iPSC reprogramming do not trigger activation of numerous cis-regulatory regions. It thus reinforces the need for already suggested deeper monitoring of the non-coding transcriptome when characterizing iPSC clones. Such differences in regulatory transcript expression may indeed impact their potential for clinical applications.
The impact of mammalian RNA interference components, particularly, Argonaute proteins, on chromatin organization is unexplored. Recent reports indicate that AGO1 association with chromatin appears to ...influence gene expression. To uncover the role of AGO1 in the nucleus, we used a combination of genome-wide approaches in control and AGO1-depleted HepG2 cells. We found that AGO1 strongly associates with active enhancers and RNA being produced at those sites. Hi-C analysis revealed AGO1 enrichment at the boundaries of topologically associated domains (TADs). By Hi-C in AGO1 knockdown cells, we observed changes in chromatin organization, including TADs and A/B compartment mixing, specifically in AGO1-bound regions. Distinct groups of genes and especially eRNA transcripts located within differentially interacting loci showed altered expression upon AGO1 depletion. Moreover, AGO1 association with enhancers is dependent on eRNA transcription. Collectively, our data suggest that enhancer-associated AGO1 contributes to the fine-tuning of chromatin architecture and gene expression in human cells.
Display omitted
•AGO1 enrichment on chromatin and transcriptional enhancers is mediated by eRNAs•AGO1 depletion alters global transcriptional output including eRNA transcripts•Enhancer-associated AGO1 contributes to the maintenance of 3D chromatin organization
Shuaib et al. used genome-wide approaches to investigate the role of nuclear AGO1 in chromatin organization and gene expression regulation in human cells. Depletion of AGO1 causes differential gene expression and disorganization of 3D chromatin structure such as loss and gain of chromosomal interactions, changes in the compartment, and eRNA, primarily in the AGO1 binding regions.
The HSA21 encoded Single-minded 2 (SIM2) transcription factor has key neurological functions and is a good candidate to be involved in the cognitive impairment of Down syndrome. We aimed to explore ...the functional capacity of SIM2 by mapping its DNA binding sites in mouse embryonic stem cells. ChIP-sequencing revealed 1229 high-confidence SIM2-binding sites. Analysis of the SIM2 target genes confirmed the importance of SIM2 in developmental and neuronal processes and indicated that SIM2 may be a master transcription regulator. Indeed, SIM2 DNA binding sites share sequence specificity and overlapping domains of occupancy with master transcription factors such as SOX2, OCT4 (Pou5f1), NANOG or KLF4. The association between SIM2 and these pioneer factors is supported by co-immunoprecipitation of SIM2 with SOX2, OCT4, NANOG or KLF4. Furthermore, the binding of SIM2 marks a particular sub-category of enhancers known as super-enhancers. These regions are characterized by typical DNA modifications and Mediator co-occupancy (MED1 and MED12). Altogether, we provide evidence that SIM2 binds a specific set of enhancer elements thus explaining how SIM2 can regulate its gene network in neuronal features.
The plasma concentration of fibrinogen varies in the healthy human population between 1.5 and 3.5 g/L. Understanding the basis of this variability has clinical importance because elevated fibrinogen ...levels are associated with increased cardiovascular disease risk. To identify novel regulatory elements involved in the control of fibrinogen expression, we used sequence conservation and in silico–predicted regulatory potential to select 14 conserved noncoding sequences (CNCs) within the conserved block of synteny containing the fibrinogen locus. The regulatory potential of each CNC was tested in vitro using a luciferase reporter gene assay in fibrinogen-expressing hepatoma cell lines (HuH7 and HepG2). 4 potential enhancers were tested for their ability to direct enhanced green fluorescent protein expression in zebrafish embryos. CNC12, a sequence equidistant from the human fibrinogen alpha and beta chain genes, activates strong liver enhanced green fluorescent protein expression in injected embryos and their transgenic progeny. A transgenic assay in embryonic day 14.5 mouse embryos confirmed the ability of CNC12 to activate transcription in the liver. While additional experiments are necessary to prove the role of CNC12 in the regulation of fibrinogen, our study reveals a novel regulatory element in the fibrinogen locus that is active in the liver and may contribute to variable fibrinogen expression in humans.