Most human genes have multiple sites at which RNA 3' end cleavage and polyadenylation can occur, enabling the expression of distinct transcript isoforms under different conditions. Novel methods to ...sequence RNA 3' ends have generated comprehensive catalogues of polyadenylation (poly(A)) sites; their analysis using innovative computational methods has revealed how poly(A) site choice is regulated by core RNA 3' end processing factors, such as cleavage factor I and cleavage and polyadenylation specificity factor, as well as by other RNA-binding proteins, particularly splicing factors. Here, we review the experimental and computational methods that have enabled the global mapping of mRNA and of long non-coding RNA 3' ends, quantification of the resulting isoforms and the discovery of regulators of alternative cleavage and polyadenylation (APA). We highlight the different types of APA-derived isoforms and their functional differences, and illustrate how APA contributes to human diseases, including cancer and haematological, immunological and neurological diseases.
Alternative polyadenylation (APA) is a general mechanism of transcript diversification in mammals, which has been recently linked to proliferative states and cancer. Different 3' untranslated region ...(3' UTR) isoforms interact with different RNA-binding proteins (RBPs), which modify the stability, translation, and subcellular localization of the corresponding transcripts. Although the heterogeneity of pre-mRNA 3' end processing has been established with high-throughput approaches, the mechanisms that underlie systematic changes in 3' UTR lengths remain to be characterized. Through a uniform analysis of a large number of 3' end sequencing data sets, we have uncovered 18 signals, six of which are novel, whose positioning with respect to pre-mRNA cleavage sites indicates a role in pre-mRNA 3' end processing in both mouse and human. With 3' end sequencing we have demonstrated that the heterogeneous ribonucleoprotein C (HNRNPC), which binds the poly(U) motif whose frequency also peaks in the vicinity of polyadenylation (poly(A)) sites, has a genome-wide effect on poly(A) site usage. HNRNPC-regulated 3' UTRs are enriched in ELAV-like RBP 1 (ELAVL1) binding sites and include those of the CD47 gene, which participate in the recently discovered mechanism of 3' UTR-dependent protein localization (UDPL). Our study thus establishes an up-to-date, high-confidence catalog of 3' end processing sites and poly(A) signals, and it uncovers an important role of HNRNPC in regulating 3' end processing. It further suggests that U-rich elements mediate interactions with multiple RBPs that regulate different stages in a transcript's life cycle.
Understanding the regulation of gene expression, including transcription start site usage, alternative splicing, and polyadenylation, requires accurate quantification of expression levels down to the ...level of individual transcript isoforms. To comparatively evaluate the accuracy of the many methods that have been proposed for estimating transcript isoform abundance from RNA sequencing data, we have used both synthetic data as well as an independent experimental method for quantifying the abundance of transcript ends at the genome-wide level.
We found that many tools have good accuracy and yield better estimates of gene-level expression compared to commonly used count-based approaches, but they vary widely in memory and runtime requirements. Nucleotide composition and intron/exon structure have comparatively little influence on the accuracy of expression estimates, which correlates most strongly with transcript/gene expression levels. To facilitate the reproduction and further extension of our study, we provide datasets, source code, and an online analysis tool on a companion website, where developers can upload expression estimates obtained with their own tool to compare them to those inferred by the methods assessed here.
As many methods for quantifying isoform abundance with comparable accuracy are available, a user's choice will likely be determined by factors such as the memory and runtime requirements, as well as the availability of methods for downstream analyses. Sequencing-based methods to quantify the abundance of specific transcript regions could complement validation schemes based on synthetic data and quantitative PCR in future or ongoing assessments of RNA-seq analysis methods.
3' Untranslated regions (3' UTRs) length is regulated in relation to cellular state. To uncover key regulators of poly(A) site use in specific conditions, we have developed PAQR, a method for ...quantifying poly(A) site use from RNA sequencing data and KAPAC, an approach that infers activities of oligomeric sequence motifs on poly(A) site choice. Application of PAQR and KAPAC to RNA sequencing data from normal and tumor tissue samples uncovers motifs that can explain changes in cleavage and polyadenylation in specific cancers. In particular, our analysis points to polypyrimidine tract binding protein 1 as a regulator of poly(A) site choice in glioblastoma.
Through alternative polyadenylation, human mRNAs acquire longer or shorter 3′ untranslated regions, the latter typically associated with higher transcript stability and increased protein production. ...To understand the dynamics of polyadenylation site usage, we performed transcriptome-wide mapping of both binding sites of 3′ end processing factors CPSF-160, CPSF-100, CPSF-73, CPSF-30, Fip1, CstF-64, CstF-64τ, CF Im25, CF Im59, and CF Im68 and 3′ end processing sites in HEK293 cells. We found that although binding sites of these factors generally cluster around the poly(A) sites most frequently used in cleavage, CstF-64/CstF-64τ and CFIm proteins have much higher positional specificity compared to CPSF components. Knockdown of CF Im68 induced a systematic use of proximal polyadenylation sites, indicating that changes in relative abundance of a single 3′ end processing factor can modulate the length of 3′ untranslated regions across the transcriptome and suggesting a mechanism behind the previously observed increase in tumor cell invasiveness upon CF Im68 knockdown.
Display omitted
► Thousands of poly(A) sites are identified by the A-seq method in a human cell line ► Binding sites of pre-mRNA 3′ end processing factors are mapped by PAR-CLIP ► CstF-64 and CF Im68 exhibit the highest positional binding specificity ► CF Im 68 siRNA treatment causes a global shift toward proximal poly(A) sites
Alternative cleavage and polyadenylation generate mRNAs with 3′ untranslated regions of different lengths. Keller, Zavolan, and colleagues mapped the binding sites of ten 3′ end processing proteins by PAR-CLIP and identified the 3′ end processing sites by A-seq. CstF-64 and CF Im68 proteins showed the highest positional specificity. Knockdown of CF Im68 induced a systematic shift toward proximal polyadenylation sites, indicating that changes in the relative abundance of a single 3′ end processing factor can modulate the length of 3′ untranslated regions transcriptome-wide.
Alternative polyadenylation is a cellular mechanism that generates mRNA isoforms differing in their 3' untranslated regions (3' UTRs). Changes in polyadenylation site usage have been described upon ...induction of proliferation in resting cells, but the underlying mechanism and functional significance of this phenomenon remain largely unknown. To understand the functional consequences of shortened 3' UTR isoforms in a physiological setting, we used 3' end sequencing and quantitative mass spectrometry to determine polyadenylation site usage, mRNA and protein levels in murine and human naive and activated T cells. Although 3' UTR shortening in proliferating cells is conserved between human and mouse, orthologous genes do not exhibit similar expression of alternative 3' UTR isoforms. We generally find that 3' UTR shortening is not accompanied by a corresponding change in mRNA and protein levels. This suggests that although 3' UTR shortening may lead to changes in the RNA-binding protein interactome, it has limited effects on protein output.
Abstract
Generated by 3′ end cleavage and polyadenylation at alternative polyadenylation (poly(A)) sites, alternative terminal exons account for much of the variation between human transcript ...isoforms. More than a dozen protocols have been developed so far for capturing and sequencing RNA 3′ ends from a variety of cell types and species. In previous studies, we have used these data to uncover novel regulatory signals and cell type-specific isoforms. Here we present an update of the PolyASite (https://polyasite.unibas.ch) resource of poly(A) sites, constructed from publicly available human, mouse and worm 3′ end sequencing datasets by enforcing uniform quality measures, including the flagging of putative internal priming sites. Through integrated processing of all data, we identified and clustered sites that are closely spaced and share polyadenylation signals, as these are likely the result of stochastic variations in processing. For each cluster, we identified the representative - most frequently processed - site and estimated the relative use in the transcriptome across all samples. We have established a modern web portal for efficient finding, exploration and export of data. Database generation is fully automated, greatly facilitating incorporation of new datasets and the updating of underlying genome resources.
Sequencing of RNA 3' ends has uncovered numerous sites that do not correspond to the termination sites of known transcripts. Through their 3' untranslated regions, protein-coding RNAs interact with ...RNA-binding proteins and microRNAs, which regulate many properties, including RNA stability and subcellular localization. We developed the terminal exon characterization (TEC) tool ( http://tectool.unibas.ch ), which can be used with RNA-sequencing data from any species for which a genome annotation that includes sites of RNA cleavage and polyadenylation is available. We discovered hundreds of previously unknown isoforms and cell-type-specific terminal exons in human cells. Ribosome profiling data revealed that many of these isoforms were translated. By applying TECtool to single-cell sequencing data, we found that the newly identified isoforms were expressed in subpopulations of cells. Thus, TECtool enables the identification of previously unknown isoforms in well-studied cell systems and in rare cell types.
The prediction of a consensus structure for a set of related RNAs is an important first step for subsequent analyses. RNAalifold, which computes the minimum energy structure that is simultaneously ...formed by a set of aligned sequences, is one of the oldest and most widely used tools for this task. In recent years, several alternative approaches have been advocated, pointing to several shortcomings of the original RNAalifold approach.
We show that the accuracy of RNAalifold predictions can be improved substantially by introducing a different, more rational handling of alignment gaps, and by replacing the rather simplistic model of covariance scoring with more sophisticated RIBOSUM-like scoring matrices. These improvements are achieved without compromising the computational efficiency of the algorithm. We show here that the new version of RNAalifold not only outperforms the old one, but also several other tools recently developed, on different datasets.
The new version of RNAalifold not only can replace the old one for almost any application but it is also competitive with other approaches including those based on SCFGs, maximum expected accuracy, or hierarchical nearest neighbor classifiers.
Accurate reconstruction of the regulatory networks that control gene expression is one of the key current challenges in molecular biology. Although gene expression and chromatin state dynamics are ...ultimately encoded by constellations of binding sites recognized by regulators such as transcriptions factors (TFs) and microRNAs (miRNAs), our understanding of this regulatory code and its context-dependent read-out remains very limited. Given that there are thousands of potential regulators in mammals, it is not practical to use direct experimentation to identify which of these play a key role for a particular system of interest. We developed a methodology that models gene expression or chromatin modifications in terms of genome-wide predictions of regulatory sites and completely automated it into a web-based tool called ISMARA (Integrated System for Motif Activity Response Analysis). Given only gene expression or chromatin state data across a set of samples as input, ISMARA identifies the key TFs and miRNAs driving expression/chromatin changes and makes detailed predictions regarding their regulatory roles. These include predicted activities of the regulators across the samples, their genome-wide targets, enriched gene categories among the targets, and direct interactions between the regulators. Applying ISMARA to data sets from well-studied systems, we show that it consistently identifies known key regulators ab initio. We also present a number of novel predictions including regulatory interactions in innate immunity, a master regulator of mucociliary differentiation, TFs consistently disregulated in cancer, and TFs that mediate specific chromatin modifications.