The reality of pervasive transcription Clark, Michael B; Amaral, Paulo P; Schlesinger, Felix J ...
PLoS biology,
07/2011, Letnik:
9, Številka:
7
Journal Article
Recenzirano
Odprti dostop
In parallel, whole chromosome tiling array interrogation of the RNA content of a variety of human tissues and cell lines revealed that, collectively, at least 93% of genomic bases are transcribed in ...one cell type or another 1,10-13. Since it is well established that highly expressed mRNAs dominate the non-ribosomal portion of the polyA+ transcriptome 7,8,10,14-19, normalization approaches were used to reduce the quantity of highly expressed transcripts in these cDNA analyses 7,8, and are implicit in tiling array approaches. ...any estimate of the pervasiveness of transcription requires inclusion of all data sources, and less than exhaustive analyses can only provide lower bounds for transcriptional complexity.
To assess the impact of genetic variation in regulatory loci on human health, we constructed a high-resolution map of allelic imbalances in DNA methylation, histone marks, and gene transcription in ...71 epigenomes from 36 distinct cell and tissue types from 13 donors. Deep whole-genome bisulfite sequencing of 49 methylomes revealed sequence-dependent CpG methylation imbalances at thousands of heterozygous regulatory loci. Such loci are enriched for stochastic switching, which is defined as random transitions between fully methylated and unmethylated states of DNA. The methylation imbalances at thousands of loci are explainable by different relative frequencies of the methylated and unmethylated states for the two alleles. Further analyses provided a unifying model that links sequence-dependent allelic imbalances of the epigenome, stochastic switching at gene regulatory loci, and disease-associated genetic variation.
While sequencing of the human genome surprised us with how many protein-coding genes there are, it did not fundamentally change our perspective on what a gene is. In contrast, the complex patterns of ...dispersed regulation and pervasive transcription uncovered by the ENCODE project, together with non-genic conservation and the abundance of noncoding RNA genes, have challenged the notion of the gene. To illustrate this, we review the evolution of operational definitions of a gene over the past century--from the abstract elements of heredity of Mendel and Morgan to the present-day ORFs enumerated in the sequence databanks. We then summarize the current ENCODE findings and provide a computational metaphor for the complexity. Finally, we propose a tentative update to the definition of a gene: A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products. Our definition side-steps the complexities of regulation and transcription by removing the former altogether from the definition and arguing that final, functional gene products (rather than intermediate transcripts) should be used to group together entities associated with a single gene. It also manifests how integral the concept of biological function is in defining genes.
The immense intercellular and intracellular heterogeneity of the CNS presents major challenges for high-throughput omic analyses. Transcriptional, translational and post-translational regulatory ...events are localized to specific neuronal cell types or subcellular compartments, resulting in discrete patterns of protein expression and activity. A spatial and quantitative knowledge of the neuroproteome is therefore critical to understanding both normal and pathological aspects of the functional genomics and anatomy of the CNS. Improvements in mass spectrometry allow the profiling of proteins at a sufficient depth to complement results from high-throughput genomic and transcriptomic assays. However, there are challenges in integrating proteomic data with other data modalities and even greater challenges in obtaining comprehensive neuroproteomic data with cell-type specificity. Here we discuss how proteomics should be exploited to enhance high-throughput functional genomic analysis by tighter integration of data analyses. We also discuss experimental strategies to achieve finer cellular and subcellular resolution in transcriptomic and proteomic studies of neural tissues.
Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ...ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.
There is growing appreciation for the importance of non-protein-coding genes in development and disease. Although much is known about microRNAs, limitations in bioinformatic analyses of RNA ...sequencing have precluded broad assessment of other forms of small-RNAs in humans. By analysing sequencing data from plasma-derived RNA from 40 individuals, here we identified over a thousand human extracellular RNAs including microRNAs, piwi-interacting RNA (piRNA), and small nucleolar RNAs. Using a targeted quantitative PCR with reverse transcription approach in an additional 2,763 individuals, we characterized almost 500 of the most abundant extracellular transcripts including microRNAs, piRNAs and small nucleolar RNAs. The presence in plasma of many non-microRNA small-RNAs was confirmed in an independent cohort. We present comprehensive data to demonstrate the broad and consistent detection of diverse classes of circulating non-cellular small-RNAs from a large population.
Supplements are increasingly important to the scientific record, particularly in genomics. However, they are often underutilized. Optimally, supplements should make results findable, accessible, ...interoperable, and reusable (i.e., "FAIR"). Moreover, properly off-loading to them the data and detail in a paper could make the main text more readable. We propose a hierarchical organization for supplements, with some parts paralleling and "shadowing" the main text and other elements branching off from it, and we suggest a specific formatting to make this structure explicit. Furthermore, sections of the supplement could be presented in multiple scientific "dialects", including machine-readable and lay-friendly formats.
Enhancers are important non-coding elements, but they have traditionally been hard to characterize experimentally. The development of massively parallel assays allows the characterization of large ...numbers of enhancers for the first time. Here, we developed a framework using Drosophila STARR-seq to create shape-matching filters based on meta-profiles of epigenetic features. We integrated these features with supervised machine-learning algorithms to predict enhancers. We further demonstrated that our model could be transferred to predict enhancers in mammals. We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mice and transduction-based reporter assays in human cell lines (153 enhancers in total). The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription factor binding patterns at predicted enhancers versus promoters. We demonstrated that these patterns enable the construction of a secondary model that effectively distinguishes enhancers and promoters.
Evolving interest in comprehensively profiling the full range of small RNAs present in small tissue biopsies and in circulating biofluids, and how the profile differs with disease, has launched small ...RNA sequencing (RNASeq) into more frequent use. However, known biases associated with small RNASeq, compounded by low RNA inputs, have been both a significant concern and a hurdle to widespread adoption. As RNASeq is becoming a viable choice for the discovery of small RNAs in low input samples and more labs are employing it, there should be benchmark datasets to test and evaluate the performance of new sequencing protocols and operators. In a recent publication from the National Institute of Standards and Technology, Pine et al., 2018, the investigators used a commercially available set of three tissues and tested performance across labs and platforms.
In this paper, we further tested the performance of low RNA input in three commonly used and commercially available RNASeq library preparation kits; NEB Next, NEXTFlex, and TruSeq small RNA library preparation. We evaluated the performance of the kits at two different sites, using three different tissues (brain, liver, and placenta) with high (1 μg) and low RNA (10 ng) input from tissue samples, or 5.0, 3.0, 2.0, 1.0, 0.5, and 0.2 ml starting volumes of plasma. As there has been a lack of robust validation platforms for differentially expressed miRNAs, we also compared low input RNASeq data with their expression profiles on three different platforms (Abcam Fireplex, HTG EdgeSeq, and Qiagen miRNome).
The concordance of RNASeq results on these three platforms was dependent on the RNA expression level; the higher the expression, the better the reproducibility. The results provide an extensive analysis of small RNASeq kit performance using low RNA input, and replication of these data on three downstream technologies.
Differences in gene expression may play a major role in speciation and phenotypic diversity. We examined genome-wide differences in transcription factor (TF) binding in several humans and a single ...chimpanzee by using chromatin immunoprecipitation followed by sequencing. The binding sites of RNA polymerase II (Polli) and a key regulator of immune responses, nuclear factor κB (p65), were mapped in 10 lymphoblastoid cell lines, and 25 and 7.5% of the respective binding regions were found to differ between individuals. Binding differences were frequently associated with single-nucleotide polymorphisms and genomic structural variants, and these differences were often correlated with differences in gene expression, suggesting functional consequences of binding variation. Furthermore, comparing Polli binding between humans and chimpanzee suggests extensive divergence in TF binding. Our results indicate that many differences in individuals and species occur at the level of TF binding, and they provide insight into the genetic events responsible for these differences.