Transcription factors are key cellular components that control gene expression: their activities determine how cells function and respond to the environment. Currently, there is great interest in ...research into human transcriptional regulation. However, surprisingly little is known about these regulators themselves. For example, how many transcription factors does the human genome contain? How are they expressed in different tissues? Are they evolutionarily conserved? Here, we present an analysis of 1,391 manually curated sequence-specific DNA-binding transcription factors, their functions, genomic organization and evolutionary conservation. Much remains to be explored, but this study provides a solid foundation for future investigations to elucidate regulatory mechanisms underlying diverse mammalian biological processes.
CLIP technologies are now widely used to study RNA-protein interactions and many data sets are now publicly available. An important first step in CLIP data exploration is the visual inspection and ...assessment of processed genomic data on selected genes or regions and performing comparisons: either across conditions within a particular project, or incorporating publicly available data. However, the output files produced by data processing pipelines or preprocessed files available to download from data repositories are often not suitable for direct comparison and usually need further processing. Furthermore, to derive biological insight it is usually necessary to visualize a CLIP signal alongside other data such as annotations, or orthogonal functional genomic data (e.g., RNA-seq). We have developed a simple, but powerful, command-line tool:
, which facilitates these visual comparative and integrative analyses with normalization and smoothing options for CLIP data and the ability to show these alongside reference annotation tracks and functional genomic data. These data can be supplied as input to
in a range of file formats, which will output a publication quality figure. It is written in R and can both run on a laptop computer independently or be integrated into computational workflows on a high-performance cluster. Releases, source code, and documentation are freely available at https://github.com/ulelab/clipplotr.
Transcriptional regulation is one of the most important processes for modulating gene expression. Though much of this control is attributed to transcription factors, histones, and associated enzymes, ...it is increasingly apparent that the spatial organization of chromosomes within the nucleus has a profound effect on transcriptional activity. Studies in yeast indicate that the nuclear pore complex might promote transcription by recruiting chromatin to the nuclear periphery. In higher eukaryotes, however, it is not known whether such regulation has global significance. Here we establish nucleoporins as a major class of global regulators for gene expression in Drosophila melanogaster. Using chromatin-immunoprecipitation combined with microarray hybridisation, we show that Nup153 and Megator (Mtor) bind to 25% of the genome in continuous domains extending 10 kb to 500 kb. These Nucleoporin-Associated Regions (NARs) are dominated by markers for active transcription, including high RNA polymerase II occupancy and histone H4K16 acetylation. RNAi-mediated knock-down of Nup153 alters the expression of approximately 5,700 genes, with a pronounced down-regulatory effect within NARs. We find that nucleoporins play a central role in coordinating dosage compensation-an organism-wide process involving the doubling of expression of the male X chromosome. NARs are enriched on the male X chromosome and occupy 75% of this chromosome. Furthermore, Nup153-depletion abolishes the normal function of the male-specific dosage compensation complex. Finally, by extensive 3D imaging, we demonstrate that NARs contribute to gene expression control irrespective of their sub-nuclear localization. Therefore, we suggest that NAR-binding is used for chromosomal organization that enables gene expression control.
Mutations causing amyotrophic lateral sclerosis (ALS) often affect the condensation properties of RNA-binding proteins (RBPs). However, the role of RBP condensation in the specificity and function of ...protein-RNA complexes remains unclear. We created a series of TDP-43 C-terminal domain (CTD) variants that exhibited a gradient of low to high condensation propensity, as observed in vitro and by nuclear mobility and foci formation. Notably, a capacity for condensation was required for efficient TDP-43 assembly on subsets of RNA-binding regions, which contain unusually long clusters of motifs of characteristic types and density. These “binding-region condensates” are promoted by homomeric CTD-driven interactions and required for efficient regulation of a subset of bound transcripts, including autoregulation of TDP-43 mRNA. We establish that RBP condensation can occur in a binding-region-specific manner to selectively modulate transcriptome-wide RNA regulation, which has implications for remodeling RNA networks in the context of signaling, disease, and evolution.
Display omitted
•TDP-43 mutants affect condensation properties to a similar extent at multiple scales•Binding-region condensates form on long RNA regions with dispersed UG-rich motifs•RBPchimera-CLIP indicates homomeric interactions promote molecular-scale condensates•Condensation selectively tunes the regulatory capacity of TDP-43; e.g., autoregulation
The condensation propensity of an RNA-binding protein tunes its binding to specific RNA regions across the transcriptome and affects its RNA processing functions. Formation of these “binding-region condensates,” promoted by specific motif types that are dispersed across long RNA regions, expands the ways in which RNA binding can be selectively controlled beyond canonical RNA-binding domains.
RNA molecules undergo a vast array of chemical post-transcriptional modifications (PTMs) that can affect their structure and interaction properties. In recent years, a growing number of PTMs have ...been successfully mapped to the transcriptome using experimental approaches relying on high-throughput sequencing. Oxford Nanopore direct-RNA sequencing has been shown to be sensitive to RNA modifications. We developed and validated Nanocompore, a robust analytical framework that identifies modifications from these data. Our strategy compares an RNA sample of interest against a non-modified control sample, not requiring a training set and allowing the use of replicates. We show that Nanocompore can detect different RNA modifications with position accuracy in vitro, and we apply it to profile m
A in vivo in yeast and human RNAs, as well as in targeted non-coding RNAs. We confirm our results with orthogonal methods and provide novel insights on the co-occurrence of multiple modified residues on individual RNA molecules.
Hi-C is one of the main methods for investigating spatial co-localisation of DNA in the nucleus. However, the raw sequencing data obtained from Hi-C experiments suffer from large biases and spurious ...contacts, making it difficult to identify true interactions. Existing methods use complex models to account for biases and do not provide a significance threshold for detecting interactions. Here we introduce a simple binomial probabilistic model that resolves complex biases and distinguishes between true and false interactions. The model corrects biases of known and unknown origin and yields a p-value for each interaction, providing a reliable threshold based on significance. We demonstrate this experimentally by testing the method against a random ligation dataset. Our method outperforms previous methods and provides a statistical framework for further data analysis, such as comparisons of Hi-C interactions between different conditions. GOTHiC is available as a BioConductor package (http://www.bioconductor.org/packages/release/bioc/html/GOTHiC.html).
Mutations causing amyotrophic lateral sclerosis (ALS) clearly implicate ubiquitously expressed and predominantly nuclear RNA binding proteins, which form pathological cytoplasmic inclusions in this ...context. However, the possibility that wild-type RNA binding proteins mislocalize without necessarily becoming constituents of cytoplasmic inclusions themselves remains relatively unexplored. We hypothesized that nuclear-to-cytoplasmic mislocalization of the RNA binding protein fused in sarcoma (FUS), in an unaggregated state, may occur more widely in ALS than previously recognized. To address this hypothesis, we analysed motor neurons from a human ALS induced-pluripotent stem cell model caused by the VCP mutation. Additionally, we examined mouse transgenic models and post-mortem tissue from human sporadic ALS cases. We report nuclear-to-cytoplasmic mislocalization of FUS in both VCP-mutation related ALS and, crucially, in sporadic ALS spinal cord tissue from multiple cases. Furthermore, we provide evidence that FUS protein binds to an aberrantly retained intron within the SFPQ transcript, which is exported from the nucleus into the cytoplasm. Collectively, these data support a model for ALS pathogenesis whereby aberrant intron retention in SFPQ transcripts contributes to FUS mislocalization through their direct interaction and nuclear export. In summary, we report widespread mislocalization of the FUS protein in ALS and propose a putative underlying mechanism for this process.
Neural induction in vertebrates generates a CNS that extends the rostral-caudal length of the body. The prevailing view is that neural cells are initially induced with anterior (forebrain) identity; ...caudalizing signals then convert a proportion to posterior fates (spinal cord). To test this model, we used chromatin accessibility to define how cells adopt region-specific neural fates. Together with genetic and biochemical perturbations, this identified a developmental time window in which genome-wide chromatin-remodeling events preconfigure epiblast cells for neural induction. Contrary to the established model, this revealed that cells commit to a regional identity before acquiring neural identity. This “primary regionalization” allocates cells to anterior or posterior regions of the nervous system, explaining how cranial and spinal neurons are generated at appropriate axial positions. These findings prompt a revision to models of neural induction and support the proposed dual evolutionary origin of the vertebrate CNS.
Display omitted
•Chromatin accessibility defines neural progenitor identity•A limited developmental window exists to establish spinal cord competency•Cells acquire axial identity prior to neural identity
Regional identity precedes neural identity of the anterior and posterior nervous systems.
Eukaryotic genomes undergo pervasive transcription, leading to the production of many types of stable and unstable RNAs. Transcription is not restricted to regions with annotated gene features but ...includes almost any genomic context. Currently, the source and function of most RNAs originating from intergenic regions in the human genome remain unclear.
We hypothesize that many intergenic RNAs can be ascribed to the presence of as-yet unannotated genes or the "fuzzy" transcription of known genes that extends beyond the annotated boundaries. To elucidate the contributions of these two sources, we assemble a dataset of more than 2.5 billion publicly available RNA-seq reads across 5 human cell lines and multiple cellular compartments to annotate transcriptional units in the human genome. About 80% of transcripts from unannotated intergenic regions can be attributed to the fuzzy transcription of existing genes; the remaining transcripts originate mainly from putative long non-coding RNA loci that are rarely spliced. We validate the transcriptional activity of these intergenic RNAs using independent measurements, including transcriptional start sites, chromatin signatures, and genomic occupancies of RNA polymerase II in various phosphorylation states. We also analyze the nuclear localization and sensitivities of intergenic transcripts to nucleases to illustrate that they tend to be rapidly degraded either on-chromatin by XRN2 or off-chromatin by the exosome.
We provide a curated atlas of intergenic RNAs that distinguishes between alternative processing of well-annotated genes from independent transcriptional units based on the combined analysis of chromatin signatures, nuclear RNA localization, and degradation pathways.
The start and end sites of messenger RNAs (TSSs and TESs) are highly regulated, often in a cell-type-specific manner. Yet the contribution of transcript diversity in regulating gene expression ...remains largely elusive. We perform an integrative analysis of multiple highly synchronized cell-fate transitions and quantitative genomic techniques in Saccharomyces cerevisiae to identify regulatory functions associated with transcribing alternative isoforms.
Cell-fate transitions feature widespread elevated expression of alternative TSS and, to a lesser degree, TES usage. These dynamically regulated alternative TSSs are located mostly upstream of canonical TSSs, but also within gene bodies possibly encoding for protein isoforms. Increased upstream alternative TSS usage is linked to various effects on canonical TSS levels, which range from co-activation to repression. We identified two key features linked to these outcomes: an interplay between alternative and canonical promoter strengths, and distance between alternative and canonical TSSs. These two regulatory properties give a plausible explanation of how locally transcribed alternative TSSs control gene transcription. Additionally, we find that specific chromatin modifiers Set2, Set3, and FACT play an important role in mediating gene repression via alternative TSSs, further supporting that the act of upstream transcription drives the local changes in gene transcription.
The integrative analysis of multiple cell-fate transitions suggests the presence of a regulatory control system of alternative TSSs that is important for dynamic tuning of gene expression. Our work provides a framework for understanding how TSS heterogeneity governs eukaryotic gene expression, particularly during cell-fate changes.