Methodological breakthroughs over the past four decades have repeatedly revolutionized transcriptome profiling. Using RNA sequencing (RNA-seq), it has now become possible to sequence and quantify the ...transcriptional outputs of individual cells or thousands of samples. These transcriptomes provide a link between cellular phenotypes and their molecular underpinnings, such as mutations. In the context of cancer, this link represents an opportunity to dissect the complexity and heterogeneity of tumours and to discover new biomarkers or therapeutic strategies. Here, we review the rationale, methodology and translational impact of transcriptome profiling in cancer.
Renal cell carcinomas (RCCs) are a heterogeneous group of neoplasms. Recent sequencing studies revealed various molecular features associated with histologic RCC subtypes, including chromophobe renal ...cell carcinoma (ChRCC).
To characterize the gene expression and biomarker signatures associated with ChRCC.
We performed integrative analysis on RNA sequencing data available from 1049 RCC specimens from The Cancer Genome Atlas and in-house studies. Our workflow identified genes relatively enriched in ChRCC, including Forkhead box I1 (FOXI1), Rh family C glycoprotein (RHCG), and LINC01187. We assessed the expression pattern of FOXI1 and RHCG protein by immunohistochemistry (IHC) and LINC01187 mRNA by RNA in situ hybridization (RNA-ISH) in whole tissue sections representing a cohort of 197 RCC cases, including both primary and metastatic tumors.
The FOXI1 and RHCG IHC staining, as well as the LINC01187 RNA-ISH staining, was evaluated in each case for intensity, pattern, and localization of expression.
All primary and metastatic classic ChRCCs demonstrated homogeneous positive labeling for FOXI1, RHCG proteins, and LINC01187 transcript. Unclassified RCC with oncocytic features, oncocytoma, and hybrid oncocytic tumor, as well as all but two cases of eosinophilic ChRCC also stained positive. Importantly, metastatic and primary RCC of all other subtypes did not demonstrate any unequivocal staining for FOXI1, RHCG, or LINC01187. In normal kidney, FOXI1, RHCG, and LINC01187 were detected in the distal nephron segment, specifically in intercalated cells. Two cases of eosinophilic ChRCC with focal expression of FOXI1 and LINC01187, and Golgi-like RHCG staining were found to contain MTOR gene mutations upon DNA sequencing.
We demonstrate a pipeline for the identification and validation of RCC subtype–specific biomarkers that can aid in the confirmation of cell of origin and may facilitate accurate classification and diagnosis of renal tumors.
FOXI1, RHCG, and LINC01187 are lineage-specific signature genes for chromophobe renal cell carcinoma.
We performed integrative RNA sequencing analysis from >1000 renal cell carcinoma (RCC) specimens, identified genes enriched in chromophobe RCC, experimentally validated the expression pattern of the top three genes (FOXI1, RHCG, and LINC01187), and revealed that the cell of origin of chromophobe RCC is intercalated cells.
We have devised a method for isolating virtually pure and comprehensive libraries of restriction fragments that contained replication initiation sites (bubbles) in vivo. We have now sequenced and ...mapped the bubble-containing fragments from GM06990, a near-normal EBV-transformed lymphoblastoid cell line, and have compared origin distributions with a comprehensive replication timing study recently published for this cell line. We find that early-firing origins, which represent ∼32% of all origins, overwhelmingly represent zones, associate only marginally with active transcription units, are localized within large domains of open chromatin, and are significantly associated with DNase I hypersensitivity. Origin "density" falls from early- to mid-S-phase, but rises again in late S-phase to levels only 17% lower than in early S-phase. Unexpectedly, late origin density calculated on the 1-Mb scale increases as a function of increasing chromatin compaction. Furthermore, the median efficiency of origins in late-replicating, heterochromatic domains is only 25% lower than in early-replicating euchromatic loci. Thus, the activation of early- and late-firing origins must be regulated by quintessentially different mechanisms. The aggregate data can be unified into a model in which initiation site selection is driven almost entirely by epigenetic factors that fashion both the long-range and local chromatin environments, with underlying DNA sequence and local transcriptional activity playing only minor roles. Importantly, the comprehensive origin map we have prepared for GM06990 overlaps moderately well with origin maps recently reported for the genomes of four different human cell lines based on the distributions of small nascent strands.
Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) is the most widely used method for characterizing the epigenetic states of chromatin on a genomic scale. With the recent ...availability of large genome-wide data sets, often comprising several epigenetic marks, novel approaches are required to explore functionally relevant interactions between histone modifications. Computational discovery of "chromatin states" defined by such combinatorial interactions enabled descriptive annotations of genomes, but more quantitative approaches are needed to progress towards predictive models.
We propose non-negative matrix factorization (NMF) as a new unsupervised method to discover combinatorial patterns of epigenetic marks that frequently co-occur in subsets of genomic regions. We show that this small set of combinatorial "codes" can be effectively displayed and interpreted. NMF codes enable dimensionality reduction and have desirable statistical properties for regression and classification tasks. We demonstrate the utility of codes in the quantitative prediction of Pol2-binding and the discrimination between Pol2-bound promoters and enhancers. Finally, we show that specific codes can be linked to molecular pathways and targets of pluripotency genes during differentiation.
We have introduced and evaluated a new computational approach to represent combinatorial patterns of epigenetic marks as quantitative variables suitable for predictive modeling and supervised machine learning. To foster widespread adoption of this method we make it available as an open-source software-package - epicode at https://github.com/mcieslik-mctp/epicode.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow ...interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts.
To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats).
PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Inducible transcription factors (TFs) mediate transcriptional responses to environmental cues. In response to multiple inflammatory signals active NF-κB dimers enter the nucleus and trigger ...cell-type-, and stimulus-specific transcriptional programs. Although much is known about NF-κB inducing pathways and about locus-specific mechanisms of transcriptional control, it is poorly understood how the pre-existing chromatin landscape determines NF-κB target selection and activation. Specifically, it is not known which epigenetic marks and pre-bound TFs serve genome-wide as positive (negative) cues for active NF-κB.
We applied multivariate and combinatorial data mining techniques on a comprehensive dataset of DNA methylation, DNase I hypersensitivity, eight epigenetic marks, and 34 TFs to arrive at genome-wide patterns that predict NF-κB binding. Strikingly, we observed NF-κB recruitment to accessible and nucleosome-bound sites. Within nucleosomal DNA NF-κB binding was primed by H3K4me1 and H2A.Z, but also hyper-methylated DNA outside of promoters and CpG-islands. Many of these predictors showed combinatorial cooperativity and statistically significant interactions. Recruitment to pre-accessible sites was more frequent and influenced by chromatin-associated TFs. We observed that specific TF-combinations are greatly enriched for (or depleted of) NF-κB binding events.
We provide evidence of NF-κB binding within genomic regions that lack classical marks of activity. These pioneer binding events are relatively often associated with transcriptional regulation. Further, our predictive models indicate that specific combinations of epigenetic marks and transcription factors predetermine the NF-κB cistrome, supporting the feasibility of using statistical approaches to identify "histone codes".