RNA sequencing has opened new avenues for the study of transcriptome composition. Significant evidence has accumulated showing that the human transcriptome contains in excess of a hundred thousand ...different transcripts. However, it is still not clear to what extent this diversity prevails when considering the relative abundances of different transcripts from the same gene.
Here we show that, in a given condition, most protein coding genes have one major transcript expressed at significantly higher level than others, that in human tissues the major transcripts contribute almost 85 percent to the total mRNA from protein coding loci, and that often the same major transcript is expressed in many tissues. We detect a high degree of overlap between the set of major transcripts and a recently published set of alternatively spliced transcripts that are predicted to be translated utilizing proteomic data. Thus, we hypothesize that although some minor transcripts may play a functional role, the major ones are likely to be the main contributors to the proteome. However, we still detect a non-negligible fraction of protein coding genes for which the major transcript does not code a protein.
Overall, our findings suggest that the transcriptome from protein coding loci is dominated by one transcript per gene and that not all the transcripts that contribute to transcriptome diversity are equally likely to contribute to protein diversity. This observation can help to prioritize candidate targets in proteomics research and to predict the functional impact of the detected changes in variation studies.
Alternative splicing is a critical determinant of genome complexity and, by implication, is assumed to engender proteomic diversity. This notion has not been experimentally tested in a targeted, ...quantitative manner. Here, we have developed an integrative approach to ask whether perturbations in mRNA splicing patterns alter the composition of the proteome. We integrate RNA sequencing (RNA-seq) (to comprehensively report intron retention, differential transcript usage, and gene expression) with a data-independent acquisition (DIA) method, SWATH-MS (sequential window acquisition of all theoretical spectra-mass spectrometry), to capture an unbiased, quantitative snapshot of the impact of constitutive and alternative splicing events on the proteome. Whereas intron retention is accompanied by decreased protein abundance, alterations in differential transcript usage and gene expression alter protein abundance proportionate to transcript levels. Our findings illustrate how RNA splicing links isoform expression in the human transcriptome with proteomic diversity and provides a foundation for studying perturbations associated with human diseases.
Display omitted
•Integrative approach to study contribution of alternative splicing to proteome•Changes in isoform usage alter protein abundance proportionate to transcript levels•Intron retention is accompanied by decreased protein abundance•Differential gene expression functionally tunes the human proteome
Liu et al. have developed an integrative approach to ask whether perturbations in mRNA splicing patterns alter the composition of the proteome. Their findings illustrate how RNA splicing links isoform expression in the human transcriptome with proteomic diversity and provides a foundation for studying perturbations associated with human diseases.
DNA arrays have been widely used to perform transcriptome-wide analysis of gene expression, and many methods have been developed to measure gene expression variability and to compare gene expression ...between conditions. Because RNA-seq is also becoming increasingly popular for transcriptome characterization, the possibility exists for further quantification of individual alternative transcript isoforms, and therefore for estimating the relative ratios of alternative splice forms within a given gene. Changes in splicing ratios, even without changes in overall gene expression, may have important phenotypic effects. Here we have developed statistical methodology to measure variability in splicing ratios within conditions, to compare it between conditions, and to identify genes with condition-specific splicing ratios. Furthermore, we have developed methodology to deconvolute the relative contribution of variability in gene expression versus variability in splicing ratios to the overall variability of transcript abundances. As a proof of concept, we have applied this methodology to estimates of transcript abundances obtained from RNA-seq experiments in lymphoblastoid cells from Caucasian and Yoruban individuals. We have found that protein-coding genes exhibit low splicing variability within populations, with many genes exhibiting constant ratios across individuals. When comparing these two populations, we have found that up to 10% of the studied protein-coding genes exhibit population-specific splicing ratios. We estimate that ~60% of the total variability observed in the abundance of transcript isoforms can be explained by variability in transcription. A large fraction of the remaining variability can likely result from variability in splicing. Finally, we also detected that variability in splicing is uncommon without variability in transcription.
MicroRNAs (miRNAs) constitute an important class of gene regulators. While models have been proposed to explain their appearance and expansion, the validation of these models has been difficult due ...to the lack of comparative studies. Here, we analyze miRNA evolutionary patterns in two mammals, human and mouse, in relation to the age of miRNA families. In this comparative framework, we confirm some predictions of previously advanced models of miRNA evolution, e.g. that miRNAs arise more frequently de novo than by duplication, or that the number of protein-coding gene targeted by miRNAs decreases with evolutionary time. We also corroborate that miRNAs display an increase in expression level with evolutionary time, however we show that this relation is largely tissue-dependent, and especially low in embryonic or nervous tissues. We identify a bias of tag-sequencing techniques regarding the assessment of breadth of expression, leading us, contrary to predictions, to find more tissue-specific expression of older miRNAs. Together, our results refine the models used so far to depict the evolution of miRNA genes. They underline the role of tissue-specific selective forces on the evolution of miRNAs, as well as the potential co-evolution patterns between miRNAs and the protein-coding genes they target.
Sequential assembly of the human spliceosome on RNA transcripts regulates splicing across the human transcriptome. The core spliceosome component PRPF8 is essential for spliceosome assembly through ...its participation in ribonucleoprotein (RNP) complexes for splice-site recognition, branch-point formation and catalysis. PRPF8 deficiency is linked to human diseases like retinitis pigmentosa or myeloid neoplasia, but its genome-wide effects on constitutive and alternative splicing remain unclear.
Here, we show that alterations in RNA splicing patterns across the human transcriptome that occur in conditions of restricted cellular PRPF8 abundance are defined by the altered splicing of introns with weak 5' splice sites. iCLIP of spliceosome components reveals that PRPF8 depletion decreases RNP complex formation at most splice sites in exon-intron junctions throughout the genome. However, impaired splicing affects only a subset of human transcripts, enriched for mitotic cell cycle factors, leading to mitotic arrest. Preferentially retained introns and differentially used exons in the affected genes contain weak 5' splice sites, but are otherwise indistinguishable from adjacent spliced introns. Experimental enhancement of splice-site strength in mini-gene constructs overcomes the effects of PRPF8 depletion on the kinetics and fidelity of splicing during transcription.
Competition for PRPF8 availability alters the transcription-coupled splicing of RNAs in which weak 5' splice sites predominate, enabling diversification of human gene expression during biological processes like mitosis. Our findings exemplify the regulatory potential of changes in the core spliceosome machinery, which may be relevant to slow-onset human genetic diseases linked to PRPF8 deficiency.
As exome sequencing gives way to genome sequencing, the need to interpret the function of regulatory DNA becomes increasingly important. To test whether evolutionary conservation of cis-regulatory ...modules (CRMs) gives insight into human gene regulation, we determined transcription factor (TF) binding locations of four liver-essential TFs in liver tissue from human, macaque, mouse, rat, and dog. Approximately, two thirds of the TF-bound regions fell into CRMs. Less than half of the human CRMs were found as a CRM in the orthologous region of a second species. Shared CRMs were associated with liver pathways and disease loci identified by genome-wide association studies. Recurrent rare human disease causing mutations at the promoters of several blood coagulation and lipid metabolism genes were also identified within CRMs shared in multiple species. This suggests that multi-species analyses of experimentally determined combinatorial TF binding will help identify genomic regions critical for tissue-specific gene control.
Cytochrome P450 2D6 (CYP2D6) plays a crucial role in metabolizing approximately 20% of medications prescribed clinically. This enzyme is encoded by the CYP2D6 gene, known for its extensive ...polymorphism with over 170 catalogued haplotypes or star alleles, which can have a profound impact on drug efficacy and safety. Despite its importance, a gap exists in the global genomic databases, which are predominantly representative of European ancestries, thereby limiting comprehensive knowledge of CYP2D6 variation in ethnically diverse populations. In an effort to bridge this knowledge gap, we focused on elucidating the CYP2D6 variation landscape within a multi-ethnic Asian cohort, encompassing individuals of Chinese, Malay, and Indian descent. Our study comprised data analysis of 1850 whole genomes from the SG10K_Health dataset using an in-house consensus algorithm, which integrates the capabilities of Cyrius, Aldy, and StellarPGx. This analysis unveiled distinct population-specific star-allele distribution trends, highlighting the unique genetic makeup of the Singaporean population. Significantly, 46% of our cohort harbored actionable CYP2D6 variants—those with direct implications for drug dosing and treatment strategies. Furthermore, we identified 14 potential novel CYP2D6 star-alleles, of which 7 were observed in multiple individuals, suggesting their broader relevance. Overall, our study contributes novel data on CYP2D6 genetic variations specific to the Southeast Asian context. The findings are instrumental for the advancement of pharmacogenomics and personalized medicine, not only in Southeast Asia but also in other regions with comparable genetic diversity.
The incidence of renal cell carcinoma (RCC) is increasing worldwide, and its prevalence is particularly high in some parts of Central Europe. Here we undertake whole-genome and transcriptome ...sequencing of clear cell RCC (ccRCC), the most common form of the disease, in patients from four different European countries with contrasting disease incidence to explore the underlying genomic architecture of RCC. Our findings support previous reports on frequent aberrations in the epigenetic machinery and PI3K/mTOR signalling, and uncover novel pathways and genes affected by recurrent mutations and abnormal transcriptome patterns including focal adhesion, components of extracellular matrix (ECM) and genes encoding FAT cadherins. Furthermore, a large majority of patients from Romania have an unexpected high frequency of A:T>T:A transversions, consistent with exposure to aristolochic acid (AA). These results show that the processes underlying ccRCC tumorigenesis may vary in different populations and suggest that AA may be an important ccRCC carcinogen in Romania, a finding with major public health implications.
Microarrays are a well-established and widely adopted technology capable of interrogating hundreds of thousands of loci across the human genome. Combined with imputation to cover common variants not ...included in the chip design, they offer a cost-effective solution for large-scale genetic studies. Beyond research applications, this technology can be applied for testing pharmacogenomics, nutrigenetics, and complex disease risk prediction. However, establishing clinical reporting workflows requires a thorough evaluation of the assay's performance, which is achieved through validation studies. In this study, we performed pre-clinical validation of a genetic testing workflow based on the Illumina Global Screening Array for 25 pharmacogenomic-related genes.
To evaluate the accuracy of our workflow, we conducted multiple pre-clinical validation studies. Here, we present the results of accuracy and precision assessments, involving a total of 73 cell lines. These assessments encompass reference materials from the Genome-In-A-Bottle (GIAB), the Genetic Testing Reference Material Coordination Program (GeT-RM) projects, as well as additional samples from the 1000 Genomes project (1KGP). We conducted an accuracy assessment of genotype calls for target loci in each indication against established truth sets.
In our per-sample analysis, we observed a mean analytical sensitivity of 99.39% and specificity 99.98%. We further assessed the accuracy of star-allele calls by relying on established diplotypes in the GeT-RM catalogue or calls made based on 1KGP genotyping. On average, we detected a diplotype concordance rate of 96.47% across 14 pharmacogenomic-related genes with star allele-calls. Lastly, we evaluated the reproducibility of our findings across replicates and observed 99.48% diplotype and 100% phenotype inter-run concordance.
Our comprehensive validation study demonstrates the robustness and reliability of the developed workflow, supporting its readiness for further development for applied testing.