Immediate-early genes (IEGs) can be activated and transcribed within minutes after stimulation, without the need for de novo protein synthesis, and they are stimulated in response to both ...cell-extrinsic and cell-intrinsic signals. Extracellular signals are transduced from the cell surface, through receptors activating a chain of proteins in the cell, in particular extracellular-signal-regulated kinases (ERKs), mitogen-activated protein kinases (MAPKs) and members of the RhoA-actin pathway. These communicate through a signaling cascade by adding phosphate groups to neighboring proteins, and this will eventually activate and translocate TFs to the nucleus and thereby induce gene expression. The gene activation also involves proximal and distal enhancers that interact with promoters to simulate gene expression. The immediate-early genes have essential biological roles, in particular in stress response, like the immune system, and in differentiation. Therefore they also have important roles in various diseases, including cancer development. In this paper we summarize some recent advances on key aspects of the activation and regulation of immediate-early genes.
Prostate cancer (PCa) has the highest incidence rates of cancers in men in western countries. Unlike several other types of cancer, PCa has few genetic drivers, which has led researchers to look for ...additional epigenetic and transcriptomic contributors to PCa development and progression. Especially datasets on DNA methylation, the most commonly studied epigenetic marker, have recently been measured and analysed in several PCa patient cohorts. DNA methylation is most commonly associated with downregulation of gene expression. However, positive associations of DNA methylation to gene expression have also been reported, suggesting a more diverse mechanism of epigenetic regulation. Such additional complexity could have important implications for understanding prostate cancer development but has not been studied at a genome-wide scale.
In this study, we have compared three sets of genome-wide single-site DNA methylation data from 870 PCa and normal tissue samples with multi-cohort gene expression data from 1117 samples, including 532 samples where DNA methylation and gene expression have been measured on the exact same samples. Genes were classified according to their corresponding methylation and expression profiles. A large group of hypermethylated genes was robustly associated with increased gene expression (UPUP group) in all three methylation datasets. These genes demonstrated distinct patterns of correlation between DNA methylation and gene expression compared to the genes showing the canonical negative association between methylation and expression (UPDOWN group). This indicates a more diversified role of DNA methylation in regulating gene expression than previously appreciated. Moreover, UPUP and UPDOWN genes were associated with different compartments - UPUP genes were related to the structures in nucleus, while UPDOWN genes were linked to extracellular features.
We identified a robust association between hypermethylation and upregulation of gene expression when comparing samples from prostate cancer and normal tissue. These results challenge the classical view where DNA methylation is always associated with suppression of gene expression, which underlines the importance of considering corresponding expression data when assessing the downstream regulatory effect of DNA methylation.
Sequencing technologies have changed not only our approaches to classical genetics, but also the field of epigenetics. Specific methods allow scientists to identify novel genome-wide epigenetic ...patterns of DNA methylation down to single-nucleotide resolution. DNA methylation is the most researched epigenetic mark involved in various processes in the human cell, including gene regulation and development of diseases, such as cancer. Increasing numbers of DNA methylation sequencing datasets from human genome are produced using various platforms-from methylated DNA precipitation to the whole genome bisulfite sequencing. Many of those datasets are fully accessible for repeated analyses. Sequencing experiments have become routine in laboratories around the world, while analysis of outcoming data is still a challenge among the majority of scientists, since in many cases it requires advanced computational skills. Even though various tools are being created and published, guidelines for their selection are often not clear, especially to non-bioinformaticians with limited experience in computational analyses. Separate tools are often used for individual steps in the analysis, and these can be challenging to manage and integrate. However, in some instances, tools are combined into pipelines that are capable to complete all the essential steps to achieve the result. In the case of DNA methylation sequencing analysis, the goal of such pipeline is to map sequencing reads, calculate methylation levels, and distinguish differentially methylated positions and/or regions. The objective of this review is to describe basic principles and steps in the analysis of DNA methylation sequencing data that in particular have been used for mammalian genomes, and more importantly to present and discuss the most pronounced computational pipelines that can be used to analyze such data. We aim to provide a good starting point for scientists with limited experience in computational analyses of DNA methylation and hydroxymethylation data, and recommend a few tools that are powerful, but still easy enough to use for their own data analysis.
Upon the first publication of the fifth iteration of the Functional Annotation of Mammalian Genomes collaborative project, FANTOM5, we gathered a series of primary data and database systems into the ...FANTOM web resource (http://fantom.gsc.riken.jp) to facilitate researchers to explore transcriptional regulation and cellular states. In the course of the collaboration, primary data and analysis results have been expanded, and functionalities of the database systems enhanced. We believe that our data and web systems are invaluable resources, and we think the scientific community will benefit for this recent update to deepen their understanding of mammalian cellular organization. We introduce the contents of FANTOM5 here, report recent updates in the web resource and provide future perspectives.
Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other ...novel genes can be predicted by identifying significantly enriched annotation terms in already annotated genes that are co-expressed with the lncRNAs. However, such approaches are sensitive to the methods that are used to estimate the level of co-expression.
We have tested and compared two well-known statistical metrics (Pearson and Spearman) and two geometrical metrics (Sobolev and Fisher) for identification of the co-expressed genes, using experimental expression data across 19 normal human tissues. We have also used a benchmarking approach based on semantic similarity to evaluate how well these methods are able to predict annotation terms, using a well-annotated set of protein-coding genes.
This work shows that geometrical metrics, in particular in combination with the statistical metrics, will predict annotation terms more efficiently than traditional approaches. Tests on selected lncRNAs confirm that it is possible to predict the function of these genes given a reliable set of expression data. The software used for this investigation is freely available.
•Novel database for cell cycle-regulated DNA repair and chromatin remodeling genes.•Analyses of genome wide regulation of DNA repair and chromatin remodeling genes.•Correlated expression of DNA ...repair genes with similar functions.
Maintenance of a genome requires DNA repair integrated with chromatin remodeling. We have analyzed six transcriptome data sets and one data set on translational regulation of known DNA repair and remodeling genes in synchronized human cells. These data are available through our new database: www.dnarepairgenes.com. Genes that have similar transcription profiles in at least two of our data sets generally agree well with known protein profiles. In brief, long patch base excision repair (BER) is enriched for S phase genes, whereas short patch BER uses genes essentially equally expressed in all cell cycle phases. Furthermore, most genes related to DNA mismatch repair, Fanconi anemia and homologous recombination have their highest expression in the S phase. In contrast, genes specific for direct repair, nucleotide excision repair, as well as non-homologous end joining do not show cell cycle-related expression. Cell cycle regulated chromatin remodeling genes were most frequently confined to G1/S and S. These include e.g. genes for chromatin assembly factor 1 (CAF-1) major subunits CHAF1A and CHAF1B; the putative helicases HELLS and ATAD2 that both co-activate E2F transcription factors central in G1/S-transition and recruit DNA repair and chromatin-modifying proteins and DNA double strand break repair proteins; and RAD54L and RAD54B involved in double strand break repair. TOP2A was consistently most highly expressed in G2, but also expressed in late S phase, supporting a role in regulating entry into mitosis. Translational regulation complements transcriptional regulation and appears to be a relatively common cell cycle regulatory mechanism for DNA repair genes. Our results identify cell cycle phases in which different pathways have highest activity, and demonstrate that periodically expressed genes in a pathway are frequently co-expressed. Furthermore, the data suggest that S phase expression and over-expression of some multifunctional chromatin remodeling proteins may set up feedback loops driving cancer cell proliferation.
Mitochondrial activity in cancer cells has been central to cancer research since Otto Warburg first published his thesis on the topic in 1956. Although Warburg proposed that oxidative phosphorylation ...in the tricarboxylic acid (TCA) cycle was perturbed in cancer, later research has shown that oxidative phosphorylation is activated in most cancers, including prostate cancer (PCa). However, more detailed knowledge on mitochondrial metabolism and metabolic pathways in cancers is still lacking. In this study we expand our previously developed method for analyzing functional homologous proteins (FunHoP), which can provide a more detailed view of metabolic pathways. FunHoP uses results from differential expression analysis of RNA-Seq data to improve pathway analysis. By adding information on subcellular localization based on experimental data and computational predictions we can use FunHoP to differentiate between mitochondrial and non-mitochondrial processes in cancerous and normal prostate cell lines. Our results show that mitochondrial pathways are upregulated in PCa and that splitting metabolic pathways into mitochondrial and non-mitochondrial counterparts using FunHoP adds to the interpretation of the metabolic properties of PCa cells.
The k-Nearest Neighbor (kNN) classifier represents a simple and very general approach to classification. Still, the performance of kNN classifiers can often compete with more complex machine-learning ...algorithms. The core of kNN depends on a “guilt by association” principle where classification is performed by measuring the similarity between a query and a set of training patterns, often computed as distances. The relative performance of kNN classifiers is closely linked to the choice of distance or similarity measure, and it is therefore relevant to investigate the effect of using different distance measures when comparing biomedical data. In this study on classification of cancer data sets, we have used both common and novel distance measures, including the novel distance measures Sobolev and Fisher, and we have evaluated the performance of kNN with these distances on 4 cancer data sets of different type. We find that the performance when using the novel distance measures is comparable to the performance with more well-established measures, in particular for the Sobolev distance. We define a robust ranking of all the distance measures according to overall performance. Several distance measures show robust performance in kNN over several data sets, in particular the Hassanat, Sobolev, and Manhattan measures. Some of the other measures show good performance on selected data sets but seem to be more sensitive to the nature of the classification data. It is therefore important to benchmark distance measures on similar data prior to classification to identify the most suitable measure in each case.
Uracil in DNA results from deamination of cytosine, resulting in mutagenic U : G mispairs, and misincorporation of dUMP, which gives a less harmful U : A pair. At least four different human DNA ...glycosylases may remove uracil and thus generate an abasic site, which is itself cytotoxic and potentially mutagenic. These enzymes are UNG, SMUG1, TDG and MBD4. The base excision repair process is completed either by a short patch- or long patch pathway, which largely use different proteins. UNG2 is a major nuclear uracil-DNA glycosylase central in removal of misincorporated dUMP in replication foci, but recent evidence also indicates an important role in repair of U : G mispairs and possibly U in single-stranded DNA. SMUG1 has broader specificity than UNG2 and may serve as a relatively efficient backup for UNG in repair of U : G mismatches and single-stranded DNA. TDG and MBD4 may have specialized roles in the repair of U and T in mismatches in CpG contexts. Recently, a role for UNG2, together with activation induced deaminase (AID) which generates uracil, has been demonstrated in immunoglobulin diversification. Studies are now underway to examine whether mice deficient in Ung develop lymphoproliferative malignancies and have a different life span.
Detection of copy number variation (CNV) in genes associated with disease is important in genetic diagnostics, and next generation sequencing (NGS) technology provides data that can be used for CNV ...detection. However, CNV detection based on NGS data is in general not often used in diagnostic labs as the data analysis is challenging, especially with data from targeted gene panels. Wet lab methods like MLPA (MRC Holland) are widely used, but are expensive, time consuming and have gene-specific limitations. Our aim has been to develop a bioinformatic tool for CNV detection from NGS data in medical genetic diagnostic samples.
Our computational pipeline for detection of CNVs in NGS data from targeted gene panels utilizes coverage depth of the captured regions and calculates a copy number ratio score for each region. This is computed by comparing the mean coverage of the sample with the mean coverage of the same region in other samples, defined as a pool. The pipeline selects pools for comparison dynamically from previously sequenced samples, using the pool with an average coverage depth that is nearest to the one of the samples. A sliding window-based approach is used to analyze each region, where length of sliding window and sliding distance can be chosen dynamically to increase or decrease the resolution. This helps in detecting CNVs in small or partial exons. With this pipeline we have correctly identified the CNVs in 36 positive control samples, with sensitivity of 100% and specificity of 91%. We have detected whole gene level deletion/duplication, single/multi exonic level deletion/duplication, partial exonic deletion and mosaic deletion. Since its implementation in mid-2018 it has proven its diagnostic value with more than 45 CNV findings in routine tests.
With this pipeline as part of our diagnostic practices it is now possible to detect partial, single or multi-exonic, and intragenic CNVs in all genes in our target panel. This has helped our diagnostic lab to expand the portfolio of genes where we offer CNV detection, which previously was limited by the availability of MLPA kits.