Alzheimer's disease (AD) is the most prevalent dementia disorder globally, and there are still no effective interventions for slowing or stopping the underlying pathogenic mechanisms. There is strong ...evidence implicating neural oxidative stress (OS) and ensuing neuroinflammation in the progressive neurodegeneration observed in the AD brain both during and prior to symptom emergence. Thus, OS-related biomarkers may be valuable for prognosis and provide clues to therapeutic targets during the early presymptomatic phase. In the current study, we gathered brain RNA-seq data of AD patients and matched controls from the Gene Expression Omnibus (GEO) to identify differentially expressed OS-related genes (OSRGs). These OSRGs were analyzed for cellular functions using the Gene Ontology (GO) database and used to construct a weighted gene co-expression network (WGCN) and protein-protein interaction (PPI) network. Receiver operating characteristic (ROC) curves were then constructed to identify network hub genes. A diagnostic model was established based on these hub genes using Least Absolute Shrinkage and Selection Operator (LASSO) and ROC analyses. Immune-related functions were examined by assessing correlations between hub gene expression and immune cell brain infiltration scores. Further, target drugs were predicted using the Drug-Gene Interaction database, while regulatory miRNAs and transcription factors were predicted using miRNet. In total, 156 candidate genes were identified among 11046 differentially expressed genes, 7098 genes in WGCN modules, and 446 OSRGs, and 5 hub genes (MAPK9, FOXO1, BCL2, ETS1, and SP1) were identified by ROC curve analyses. These hub genes were enriched in GO annotations "Alzheimer's disease pathway," "Parkinson's Disease," "Ribosome," and "Chronic myeloid leukemia." In addition, 78 drugs were predicted to target FOXO1, SP1, MAPK9, and BCL2, including fluorouracil, cyclophosphamide, and epirubicin. A hub gene-miRNA regulatory network with 43 miRNAs and hub gene-transcription factor (TF) network with 36 TFs were also generated. These hub genes may serve as biomarkers for AD diagnosis and provide clues to novel potential treatment targets.
Single-cell technologies make it possible to quantify the comprehensive states of individual cells, and have the power to shed light on cellular differentiation in particular. Although several ...methods have been developed to fully analyze the single-cell expression data, there is still room for improvement in the analysis of differentiation.
In this paper, we propose a novel method SCOUP to elucidate differentiation process. Unlike previous dimension reduction-based approaches, SCOUP describes the dynamics of gene expression throughout differentiation directly, including the degree of differentiation of a cell (in pseudo-time) and cell fate. SCOUP is superior to previous methods with respect to pseudo-time estimation, especially for single-cell RNA-seq. SCOUP also successfully estimates cell lineage more accurately than previous method, especially for cells at an early stage of bifurcation. In addition, SCOUP can be applied to various downstream analyses. As an example, we propose a novel correlation calculation method for elucidating regulatory relationships among genes. We apply this method to a single-cell RNA-seq data and detect a candidate of key regulator for differentiation and clusters in a correlation network which are not detected with conventional correlation analysis.
We develop a stochastic process-based method SCOUP to analyze single-cell expression data throughout differentiation. SCOUP can estimate pseudo-time and cell lineage more accurately than previous methods. We also propose a novel correlation calculation method based on SCOUP. SCOUP is a promising approach for further single-cell analysis and available at https://github.com/hmatsu1226/SCOUP.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The identification of cancer subtypes can help researchers understand hidden genomic mechanisms, enhance diagnostic accuracy and improve clinical treatments. With the development of high-throughput ...techniques, researchers can access large amounts of data from multiple sources. Because of the high dimensionality and complexity of multiomics and clinical data, research into the integration of multiomics data is needed, and developing effective tools for such purposes remains a challenge for researchers. In this work, we proposed an entirely unsupervised clustering method without harnessing any prior knowledge (MODEC). We used manifold optimization and deep-learning techniques to integrate multiomics data for the identification of cancer subtypes and the analysis of significant clinical variables. Since there is nonlinearity in the gene-level datasets, we used manifold optimization methodology to extract essential information from the original omics data to obtain a low-dimensional latent subspace. Then, MODEC uses a deep learning-based clustering module to iteratively define cluster centroids and assign cluster labels to each sample by minimizing the Kullback-Leibler divergence loss. MODEC was applied to six public cancer datasets from The Cancer Genome Atlas database and outperformed eight competing methods in terms of the accuracy and reliability of the subtyping results. MODEC was extremely competitive in the identification of survival patterns and significant clinical features, which could help doctors monitor disease progression and provide more suitable treatment strategies.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The analysis of RNA-Seq data from individual differentiating cells enables us to reconstruct the differentiation process and the degree of differentiation (in pseudo-time) of each cell. Such analyses ...can reveal detailed expression dynamics and functional relationships for differentiation. To further elucidate differentiation processes, more insight into gene regulatory networks is required. The pseudo-time can be regarded as time information and, therefore, single-cell RNA-Seq data are time-course data with high time resolution. Although time-course data are useful for inferring networks, conventional inference algorithms for such data suffer from high time complexity when the number of samples and genes is large. Therefore, a novel algorithm is necessary to infer networks from single-cell RNA-Seq during differentiation.
In this study, we developed the novel and efficient algorithm SCODE to infer regulatory networks, based on ordinary differential equations. We applied SCODE to three single-cell RNA-Seq datasets and confirmed that SCODE can reconstruct observed expression dynamics. We evaluated SCODE by comparing its inferred networks with use of a DNaseI-footprint based network. The performance of SCODE was best for two of the datasets and nearly best for the remaining dataset. We also compared the runtimes and showed that the runtimes for SCODE are significantly shorter than for alternatives. Thus, our algorithm provides a promising approach for further single-cell differentiation analyses.
The R source code of SCODE is available at https://github.com/hmatsu1226/SCODE.
hirotaka.matsumoto@riken.jp.
Supplementary data are available at Bioinformatics online.
RNA secondary structure around splice sites is known to assist normal splicing by promoting spliceosome recognition. However, analyzing the structural properties of entire intronic regions or ...pre-mRNA sequences has been difficult hitherto, owing to serious experimental and computational limitations, such as low read coverage and numerical problems.
Our novel software, "ParasoR", is designed to run on a computer cluster and enables the exact computation of various structural features of long RNA sequences under the constraint of maximal base-pairing distance. ParasoR divides dynamic programming (DP) matrices into smaller pieces, such that each piece can be computed by a separate computer node without losing the connectivity information between the pieces. ParasoR directly computes the ratios of DP variables to avoid the reduction of numerical precision caused by the cancellation of a large number of Boltzmann factors. The structural preferences of mRNAs computed by ParasoR shows a high concordance with those determined by high-throughput sequencing analyses. Using ParasoR, we investigated the global structural preferences of transcribed regions in the human genome. A genome-wide folding simulation indicated that transcribed regions are significantly more structural than intergenic regions after removing repeat sequences and k-mer frequency bias. In particular, we observed a highly significant preference for base pairing over entire intronic regions as compared to their antisense sequences, as well as to intergenic regions. A comparison between pre-mRNAs and mRNAs showed that coding regions become more accessible after splicing, indicating constraints for translational efficiency. Such changes are correlated with gene expression levels, as well as GC content, and are enriched among genes associated with cytoskeleton and kinase functions.
We have shown that ParasoR is very useful for analyzing the structural properties of long RNA sequences such as mRNAs, pre-mRNAs, and long non-coding RNAs whose lengths can be more than a million bases in the human genome. In our analyses, transcribed regions including introns are indicated to be subject to various types of structural constraints that cannot be explained from simple sequence composition biases. ParasoR is freely available at https://github.com/carushi/ParasoR .
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Abstract
Motivation
Evolve and resequence (E&R) experiments show promise in capturing real-time evolution at genome-wide scales, enabling the assessment of allele frequency changes SNPs in evolving ...populations and thus the estimation of population genetic parameters in the Wright–Fisher model (WF) that quantify the selection on SNPs. Currently, these analyses face two key difficulties: the numerous SNPs in E&R data and the frequent unreliability of estimates. Hence, a methodology for efficiently estimating WF parameters is needed to understand the evolutionary processes that shape genomes.
Results
We developed a novel method for estimating WF parameters (EMWER), by applying an expectation maximization algorithm to the Kolmogorov forward equation associated with the WF model diffusion approximation. EMWER was used to infer the effective population size, selection coefficients and dominance parameters from E&R data. Of the methods examined, EMWER was the most efficient method for selection strength estimation in multi-core computing environments, estimating both selection and dominance with accurate confidence intervals. We applied EMWER to E&R data from experimental Drosophila populations adapting to thermally fluctuating environments and found a common selection affecting allele frequency of many SNPs within the cosmopolitan In(3R)P inversion. Furthermore, this application indicated that many of beneficial alleles in this experiment are dominant.
Availability and implementation
Our C++ implementation of ‘EMWER’ is available at https://github.com/kojikoji/EMWER.
Supplementary information
Supplementary data are available at Bioinformatics online.
Measuring evolutionary conservation is a routine step in the identification of functional elements in genome sequences. Although a number of studies have proposed methods that use the continuous time ...Markov models (CTMMs) to find evolutionarily constrained elements, their probabilistic structures have been less frequently investigated.
In this article, we investigate a sufficient statistic for CTMMs. The statistic is composed of the fractional duration of nucleotide characters over evolutionary time, F(d), and the number of substitutions occurring in phylogenetic trees, N(s). We first derive basic properties of the sufficient statistic. Then, we derive an expectation maximization (EM) algorithm for estimating the parameters of a phylogenetic model, which iteratively computes the expectation values of the sufficient statistic. We show that the EM algorithm exhibits much faster convergence than other optimization methods that use numerical gradient descent algorithms. Finally, we investigate the genome-wide distribution of fractional duration time F(d) which, unlike the number of substitutions N(s), has rarely been investigated. We show that F(d) has evolutionary information that is distinct from that in N(s), which may be useful for detecting novel types of evolutionary constraints existing in the human genome.
The C++ source code of the 'Fdur' software is available at http://www.ncrna.org/software/fdur/
kiryu-h@k.u-tokyo.ac.jp
Supplementary data are available at Bioinformatics online.
Motivation: Recent studies have shown that the methods for predicting secondary structures of RNAs on the basis of posterior decoding of the base-pairing probabilities has an advantage with respect ...to prediction accuracy over the conventionally utilized minimum free energy methods. However, there is room for improvement in the objective functions presented in previous studies, which are maximized in the posterior decoding with respect to the accuracy measures for secondary structures. Results: We propose novel estimators which improve the accuracy of secondary structure prediction of RNAs. The proposed estimators maximize an objective function which is the weighted sum of the expected number of the true positives and that of the true negatives of the base pairs. The proposed estimators are also improved versions of the ones used in previous works, namely CONTRAfold for secondary structure prediction from a single RNA sequence and McCaskill-MEA for common secondary structure prediction from multiple alignments of RNA sequences. We clarify the relations between the proposed estimators and the estimators presented in previous works, and theoretically show that the previous estimators include additional unnecessary terms in the evaluation measures with respect to the accuracy. Furthermore, computational experiments confirm the theoretical analysis by indicating improvement in the empirical accuracy. The proposed estimators represent extensions of the centroid estimators proposed in Ding et al. and Carvalho and Lawrence, and are applicable to a wide variety of problems in bioinformatics. Availability: Supporting information and the CentroidFold software are available online at: http://www.ncrna.org/software/centroidfold/. Contact: hamada-michiaki@aist.go.jp Supplementary information: Supplementary data are available at Bioinformatics online.
Recently, next-generation sequencing techniques have been applied for the detection of RNA secondary structures, which is referred to as high-throughput RNA structural (HTS) analyses, and many ...different protocols have been used to detect comprehensive RNA structures at single-nucleotide resolution. However, the existing computational analyses heavily depend on the experimental methodology to generate data, which results in difficulties associated with statistically sound comparisons or combining the results obtained using different HTS methods.
Here, we introduced a statistical framework, reactIDR, which can be applied to the experimental data obtained using multiple HTS methodologies. Using this approach, nucleotides are classified into three structural categories, loop, stem/background, and unmapped. reactIDR uses the irreproducible discovery rate (IDR) with a hidden Markov model to discriminate between the true and spurious signals obtained in the replicated HTS experiments accurately, and it is able to incorporate an expectation-maximization algorithm and supervised learning for efficient parameter optimization. The results of our analyses of the real-life HTS data showed that reactIDR had the highest accuracy in the classification of ribosomal RNA stem/loop structures when using both individual and integrated HTS datasets, and its results corresponded the best to the three-dimensional structures.
We have developed a novel software, reactIDR, for the prediction of stem/loop regions from the HTS analysis datasets. For the rRNA structure analyses, reactIDR was shown to have robust accuracy across different datasets by using the reproducibility criterion, suggesting its potential for increasing the value of existing HTS datasets. reactIDR is publicly available at https://github.com/carushi/reactIDR .
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Aligning multiple RNA sequences is essential for analyzing non-coding RNAs. Although many alignment methods for non-coding RNAs, including Sankoff's algorithm for strict structural alignments, have ...been proposed, they are either inaccurate or computationally too expensive. Faster methods with reasonable accuracies are required for genome-scale analyses.
We propose a fast algorithm for multiple structural alignments of RNA sequences that is an extension of our pairwise structural alignment method (implemented in SCARNA). The accuracies of the implemented software, MXSCARNA, are at least as favorable as those of state-of-art algorithms that are computationally much more expensive in time and memory.
The proposed method for structural alignment of multiple RNA sequences is fast enough for large-scale analyses with accuracies at least comparable to those of existing algorithms. The source code of MXSCARNA and its web server are available at http://mxscarna.ncrna.org.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK