Tumour samples containing distinct sub-populations of cancer and normal cells present challenges in the development of reproducible biomarkers, as these biomarkers are based on bulk signals from ...mixed tumour profiles. ISOpure is the only mRNA computational purification method to date that does not require a paired tumour-normal sample, provides a personalized cancer profile for each patient, and has been tested on clinical data. Replacing mixed tumour profiles with ISOpure-preprocessed cancer profiles led to better prognostic gene signatures for lung and prostate cancer.
To simplify the integration of ISOpure into standard R-based bioinformatics analysis pipelines, the algorithm has been implemented as an R package. The ISOpureR package performs analogously to the original code in estimating the fraction of cancer cells and the patient cancer mRNA abundance profile from tumour samples in four cancer datasets.
The ISOpureR package estimates the fraction of cancer cells and personalized patient cancer mRNA abundance profile from a mixed tumour profile. This open-source R implementation enables integration into existing computational pipelines, as well as easy testing, modification and extension of the model.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
MicroRNAs (miRNAs) regulate a large proportion of mammalian genes by hybridizing to targeted messenger RNAs (mRNAs) and down-regulating their translation into protein. Although much work has been ...done in the genome-wide computational prediction of miRNA genes and their target mRNAs, an open question is how to efficiently obtain functional miRNA targets from a large number of candidate miRNA targets predicted by existing computational algorithms. In this paper, we propose a novel Bayesian model and learning algorithm, GenMiR++ (Generative model for miRNA regulation), that accounts for patterns of gene expression using miRNA expression data and a set of candidate miRNA targets. A set of high-confidence functional miRNA targets are then obtained from the data using a Bayesian learning algorithm. Our model scores 467 high-confidence miRNA targets out of 1,770 targets obtained from TargetScanS in mouse at a false detection rate of 2.5%: several confirmed miRNA targets appear in our high-confidence set, such as the interactions between miR-92 and the signal transduction gene MAP2K4, as well as the relationship between miR-16 and BCL2, an anti-apoptotic gene which has been implicated in chronic lymphocytic leukemia. We present results on the robustness of our model showing that our learning algorithm is not sensitive to various perturbations of the data. Our high-confidence targets represent a significant increase in the number of miRNA targets and represent a starting point for a global understanding of gene regulation.
We describe the application of a microarray platform, which combines information from exon body and splice-junction probes, to perform a quantitative analysis of tissue-specific alternative splicing ...(AS) for thousands of exons in mammalian cells. Through this system, we have analyzed global features of AS in major mouse tissues. The results provide numerous inferences for the functions of tissue-specific AS, insights into how the evolutionary history of exons can impact on their inclusion levels, and also information on how global regulatory properties of AS define tissue type. Like global transcription profiles, global AS profiles reflect tissue identity. Interestingly, we find that transcription and AS act independently on different sets of genes in order to define tissue-specific expression profiles. These results demonstrate the utility of our quantitative microarray platform and data for revealing important global regulatory features of AS.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Vertebrates share the same general body plan and organs, possess related sets of genes, and rely on similar physiological mechanisms, yet show great diversity in morphology, habitat and behavior. ...Alteration of gene regulation is thought to be a major mechanism in phenotypic variation and evolution, but relatively little is known about the broad patterns of conservation in gene expression in non-mammalian vertebrates.
We measured expression of all known and predicted genes across twenty tissues in chicken, frog and pufferfish. By combining the results with human and mouse data and considering only ten common tissues, we have found evidence of conserved expression for more than a third of unique orthologous genes. We find that, on average, transcription factor gene expression is neither more nor less conserved than that of other genes. Strikingly, conservation of expression correlates poorly with the amount of conserved nonexonic sequence, even using a sequence alignment technique that accounts for non-collinearity in conserved elements. Many genes show conserved human/fish expression despite having almost no nonexonic conserved primary sequence.
There are clearly strong evolutionary constraints on tissue-specific gene expression. A major challenge will be to understand the precise mechanisms by which many gene expression patterns remain similar despite extensive cis-regulatory restructuring.
Proper regulation of germline gene expression is essential for fertility and maintaining species integrity. In the C. elegans germline, a diverse repertoire of regulatory pathways promote the ...expression of endogenous germline genes and limit the expression of deleterious transcripts to maintain genome homeostasis. Here we show that the conserved TRIM-NHL protein, NHL-2, plays an essential role in the C. elegans germline, modulating germline chromatin and meiotic chromosome organization. We uncover a role for NHL-2 as a co-factor in both positively (CSR-1) and negatively (HRDE-1) acting germline 22G-small RNA pathways and the somatic nuclear RNAi pathway. Furthermore, we demonstrate that NHL-2 is a bona fide RNA binding protein and, along with RNA-seq data point to a small RNA independent role for NHL-2 in regulating transcripts at the level of RNA stability. Collectively, our data implicate NHL-2 as an essential hub of gene regulatory activity in both the germline and soma.
Mutations of EXOSC3 have been linked to the rare neurological disorder known as Pontocerebellar Hypoplasia type 1B (PCH1B). EXOSC3 is one of three putative RNA-binding structural cap proteins that ...guide RNA into the RNA exosome, the cellular machinery that degrades RNA. Using RNAcompete, we identified a G-rich RNA motif binding to EXOSC3. Surface plasmon resonance (SPR) and microscale thermophoresis (MST) indicated an affinity in the low micromolar range of EXOSC3 for long and short G-rich RNA sequences. Although several PCH1B-causing mutations in EXOSC3 did not engage a specific RNA motif as shown by RNAcompete, they exhibited lower binding affinity to G-rich RNA as demonstrated by MST. To test the hypothesis that modification of the RNA–protein interface in EXOSC3 mutants may be phenocopied by small molecules, we performed an in-silico screen of 50 000 small molecules and used enzyme-linked immunosorbant assays (ELISAs) and MST to assess the ability of the molecules to inhibit RNA-binding by EXOSC3. We identified a small molecule, EXOSC3-RNA disrupting (ERD) compound 3 (ERD03), which (i) bound specifically to EXOSC3 in saturation transfer difference nuclear magnetic resonance (STD-NMR), (ii) disrupted the EXOSC3–RNA interaction in a concentration-dependent manner, and (iii) produced a PCH1B-like phenotype with a 50% reduction in the cerebellum and an abnormally curved spine in zebrafish embryos. This compound also induced modification of zebrafish RNA expression levels similar to that observed with a morpholino against EXOSC3. To our knowledge, this is the first example of a small molecule obtained by rational design that models the abnormal developmental effects of a neurodegenerative disease in a whole organism.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM, UPUK
Identifying genes in the genomic context is central to a cell's ability to interpret the genome. Yet, in general, the signals used to define eukaryotic genes are poorly described. Here, we derived ...simple classifiers that identify where transcription will initiate and terminate using nucleic acid sequence features detectable by the yeast cell, which we integrate into a Unified Model (UM) that models transcription as a whole. The cis-elements that denote where transcription initiates function primarily through nucleosome depletion, and, using a synthetic promoter system, we show that most of these elements are sufficient to initiate transcription in vivo. Hrp1 binding sites are the major characteristic of terminators; these binding sites are often clustered in terminator regions and can terminate transcription bidirectionally. The UM predicts global transcript structure by modeling transcription of the genome using a hidden Markov model whose emissions are the outputs of the initiation and termination classifiers. We validated the novel predictions of the UM with available RNA-seq data and tested it further by directly comparing the transcript structure predicted by the model to the transcription generated by the cell for synthetic DNA segments of random design. We show that the UM identifies transcription start sites more accurately than the initiation classifier alone, indicating that the relative arrangement of promoter and terminator elements influences their function. Our model presents a concrete description of how the cell defines transcript units, explains the existence of nongenic transcripts, and provides insight into genome evolution.
MOTIVATION: Lung cancer is often discovered long after its onset, making identifying genes important in its initiation and progression a challenge. By the time the tumors are discovered, we only ...observe the final sum of changes of the few genes that initiated cancer and thousands of genes that they have influenced. Gene interactions and heterogeneity of samples make it difficult to identify genes consistent between different cohorts. Using gene and gene–product interaction networks, we propose a principled approach to identify a small subset of genes whose network neighbors exhibit consistently high expression change (in cancerous tissue versus normal) regardless of their own expression. We hypothesize that these genes can shed light on the larger scale perturbations in the overall landscape of expression levels. RESULTS: We benchmark our method on simulated data, and show that we can recover a true gene list in noisy measurement data. We then apply our method to four non-small cell lung cancer and two pancreatic cancer cohorts, finding several genes that are consistent within all cohorts of the same cancer type. CONCLUSION: Our model is flexible, robust and identifies gene sets that are more consistent across cohorts than several other approaches. Additionally, our method can be applied on a per-patient basis not requiring large cohorts of patients to find genes of influence. Our approach is generally applicable to gene expression studies where the goal is to identify a small set of influential genes that may in turn explain the much larger set of genome-wide expression changes. AVAILABILITY: The code is available at http://morrislab.med.utoronto.ca/~anna/cannet.zip CONTACT: anna.goldenberg@utoronto.ca Supplementary Information: Supplementary data are available at Bioinformatics online.
Motivation: We address the problem of multi-way clustering of microarray data using a generative model. Our algorithm, probabilistic sparse matrix factorization (PSMF), is a probabilistic extension ...of a previous hard-decision algorithm for this problem. PSMF allows for varying levels of sensor noise in the data, uncertainty in the hidden prototypes used to explain the data and uncertainty as to the prototypes selected to explain each data vector. Results: We present experimental results demonstrating that our method can better recover functionally-relevant clusterings in mRNA expression data than standard clustering techniques, including hierarchical agglomerative clustering, and we show that by computing probabilities instead of point estimates, our method avoids converging to poor solutions. Contact: delbert@psi.toronto.edu
Large-scale quantitative analysis of transcriptional co-expression has been used to dissect regulatory networks and to predict the functions of new genes discovered by genome sequencing in model ...organisms such as yeast. Although the idea that tissue-specific expression is indicative of gene function in mammals is widely accepted, it has not been objectively tested nor compared with the related but distinct strategy of correlating gene co-expression as a means to predict gene function.
We generated microarray expression data for nearly 40,000 known and predicted mRNAs in 55 mouse tissues, using custom-built oligonucleotide arrays. We show that quantitative transcriptional co-expression is a powerful predictor of gene function. Hundreds of functional categories, as defined by Gene Ontology 'Biological Processes', are associated with characteristic expression patterns across all tissues, including categories that bear no overt relationship to the tissue of origin. In contrast, simple tissue-specific restriction of expression is a poor predictor of which genes are in which functional categories. As an example, the highly conserved mouse gene PWP1 is widely expressed across different tissues but is co-expressed with many RNA-processing genes; we show that the uncharacterized yeast homolog of PWP1 is required for rRNA biogenesis.
We conclude that 'functional genomics' strategies based on quantitative transcriptional co-expression will be as fruitful in mammals as they have been in simpler organisms, and that transcriptional control of mammalian physiology is more modular than is generally appreciated. Our data and analyses provide a public resource for mammalian functional genomics.