Determining the toxicity of chemicals is necessary to identify their harmful effects on humans, animals, plants, or the environment. It is also one of the main steps in drug design. Animal models ...have been used for a long time for toxicity testing. However, in vivo animal tests are constrained by time, ethical considerations, and financial burden. Therefore, computational methods for estimating the toxicity of chemicals are considered useful. In silico toxicology is one type of toxicity assessment that uses computational methods to analyze, simulate, visualize, or predict the toxicity of chemicals. In silico toxicology aims to complement existing toxicity tests to predict toxicity, prioritize chemicals, guide toxicity tests, and minimize late‐stage failures in drugs design. There are various methods for generating models to predict toxicity endpoints. We provide a comprehensive overview, explain, and compare the strengths and weaknesses of the existing modeling methods and algorithms for toxicity prediction with a particular (but not exclusive) emphasis on computational tools that can implement these methods and refer to expert systems that deploy the prediction models. Finally, we briefly review a number of new research directions in in silico toxicology and provide recommendations for designing in silico models. WIREs Comput Mol Sci 2016, 6:147–172. doi: 10.1002/wcms.1240
This article is categorized under:
Computer and Information Science > Chemoinformatics
Computer and Information Science > Databases and Expert Systems
Computer and Information Science > Computer Algorithms and Programming
Long non-coding RNAs (lncRNAs) have been found to perform various functions in a wide variety of important biological processes. To make easier interpretation of lncRNA functionality and conduct deep ...mining on these transcribed sequences, it is convenient to classify lncRNAs into different groups. Here, we summarize classification methods of lncRNAs according to their four major features, namely, genomic location and context, effect exerted on DNA sequences, mechanism of functioning and their targeting mechanism. In combination with the presently available function annotations, we explore potential relationships between different classification categories, and generalize and compare biological features of different lncRNAs within each category. Finally, we present our view on potential further studies. We believe that the classifications of lncRNAs as indicated above are of fundamental importance for lncRNA studies, helpful for further investigation of specific lncRNAs, for formulation of new hypothesis based on different features of lncRNA and for exploration of the underlying lncRNA functional mechanisms.
Abstract
Long non-coding RNAs (lncRNAs) have significant functions in a wide range of important biological processes. Although the number of known human lncRNAs has dramatically increased, they are ...poorly annotated, posing great challenges for better understanding their functional significance and elucidating their complex functioning molecular mechanisms. Here, we present LncBook (http://bigd.big.ac.cn/lncbook), a curated knowledgebase of human lncRNAs that features a comprehensive collection of human lncRNAs and systematic curation of lncRNAs by multi-omics data integration, functional annotation and disease association. In the present version, LncBook houses a large number of 270 044 lncRNAs and includes 1867 featured lncRNAs with 3762 lncRNA–function associations. It also integrates an abundance of multi-omics data from expression, methylation, genome variation and lncRNA–miRNA interaction. Also, LncBook incorporates 3772 experimentally validated lncRNA-disease associations and further identifies a total of 97 998 lncRNAs that are putatively disease-associated. Collectively, LncBook is dedicated to the integration and curation of human lncRNAs as well as their associated data and thus bears great promise to serve as a valuable knowledgebase for worldwide research communities.
Abstract
Motivation
Finding computationally drug-target interactions (DTIs) is a convenient strategy to identify new DTIs at low cost with reasonable accuracy. However, the current DTI prediction ...methods suffer the high false positive prediction rate.
Results
We developed DDR, a novel method that improves the DTI prediction accuracy. DDR is based on the use of a heterogeneous graph that contains known DTIs with multiple similarities between drugs and multiple similarities between target proteins. DDR applies non-linear similarity fusion method to combine different similarities. Before fusion, DDR performs a pre-processing step where a subset of similarities is selected in a heuristic process to obtain an optimized combination of similarities. Then, DDR applies a random forest model using different graph-based features extracted from the DTI heterogeneous graph. Using 5-repeats of 10-fold cross-validation, three testing setups, and the weighted average of area under the precision-recall curve (AUPR) scores, we show that DDR significantly reduces the AUPR score error relative to the next best start-of-the-art method for predicting DTIs by 31% when the drugs are new, by 23% when targets are new and by 34% when the drugs and the targets are known but not all DTIs between them are not known. Using independent sources of evidence, we verify as correct 22 out of the top 25 DDR novel predictions. This suggests that DDR can be used as an efficient method to identify correct DTIs.
Availability and implementation
The data and code are provided at https://bitbucket.org/RSO24/ddr/.
Supplementary information
Supplementary data are available at Bioinformatics online.
Many scientific problems can be formulated as classification tasks. Data that harbor relevant information are usually described by a large number of features. Frequently, many of these features are ...irrelevant for the class prediction. The efficient implementation of classification models requires identification of suitable combinations of features. The smaller number of features reduces the problem's dimensionality and may result in higher classification performance. We developed DWFS, a web-based tool that allows for efficient selection of features for a variety of problems. DWFS follows the wrapper paradigm and applies a search strategy based on Genetic Algorithms (GAs). A parallel GA implementation examines and evaluates simultaneously large number of candidate collections of features. DWFS also integrates various filtering methods that may be applied as a pre-processing step in the feature selection process. Furthermore, weights and parameters in the fitness function of GA can be adjusted according to the application requirements. Experiments using heterogeneous datasets from different biomedical applications demonstrate that DWFS is fast and leads to a significant reduction of the number of features without sacrificing performance as compared to several widely used existing methods. DWFS can be accessed online at www.cbrc.kaust.edu.sa/dwfs.
More people die from cardiovascular diseases (CVD) than from any other cause. Cardiovascular complications are thought to arise from enhanced levels of free radicals causing impaired “redox ...homeostasis,” which represents the interplay between oxidative stress (OS) and reductive stress (RS). In this review, we compile several experimental research findings that show sustained shifts towards OS will alter the homeostatic redox mechanism to cause cardiovascular complications, as well as findings that show a prolonged antioxidant state or RS can similarly lead to such cardiovascular complications. This experimental evidence is specifically focused on the role of glutathione, the most abundant antioxidant in the heart, in a redox homeostatic mechanism that has been shifted towards OS or RS. This may lead to impairment of cellular signaling mechanisms and elevated pools of proteotoxicity associated with cardiac dysfunction.
Transcription regulation in multicellular eukaryotes is orchestrated by a number of DNA functional elements located at gene regulatory regions. Some regulatory regions (e.g. enhancers) are located ...far away from the gene they affect. Identification of distal regulatory elements is a challenge for the bioinformatics research. Although existing methodologies increased the number of computationally predicted enhancers, performance inconsistency of computational models across different cell-lines, class imbalance within the learning sets and ad hoc rules for selecting enhancer candidates for supervised learning, are some key questions that require further examination. In this study we developed DEEP, a novel ensemble prediction framework. DEEP integrates three components with diverse characteristics that streamline the analysis of enhancer's properties in a great variety of cellular conditions. In our method we train many individual classification models that we combine to classify DNA regions as enhancers or non-enhancers. DEEP uses features derived from histone modification marks or attributes coming from sequence characteristics. Experimental results indicate that DEEP performs better than four state-of-the-art methods on the ENCODE data. We report the first computational enhancer prediction results on FANTOM5 data where DEEP achieves 90.2% accuracy and 90% geometric mean (GM) of specificity and sensitivity across 36 different tissues. We further present results derived using in vivo-derived enhancer data from VISTA database. DEEP-VISTA, when tested on an independent test set, achieved GM of 80.1% and accuracy of 89.64%. DEEP framework is publicly available at http://cbrc.kaust.edu.sa/deep/.
Abstract
We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited ...from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities. An interactive interface and bulk downloads are available on the web: http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. In this release, we complement HOCOMOCO by MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru) that applies HOCOMOCO models for visualization of binding sites in short DNA sequences.
The inside cover image, by Vladimir B. Bajic and Arwa B. Raies, is based on the Advanced Review
In silico toxicology: computational methods for prediction of chemical toxicity
, DOI:
...10.1002/wcms.1240
.
image
Enhancers are cis-acting DNA elements that play critical roles in distal regulation of gene expression. Identifying enhancers is an important step for understanding distinct gene expression programs ...that may reflect normal and pathogenic cellular conditions. Experimental identification of enhancers is constrained by the set of conditions used in the experiment. This requires multiple experiments to identify enhancers, as they can be active under specific cellular conditions but not in different cell types/tissues or cellular states. This has opened prospects for computational prediction methods that can be used for high-throughput identification of putative enhancers to complement experimental approaches. Potential functions and properties of predicted enhancers have been catalogued and summarized in several enhancer-oriented databases. Because the current methods for the computational prediction of enhancers produce significantly different enhancer predictions, it will be beneficial for the research community to have an overview of the strategies and solutions developed in this field. In this review, we focus on the identification and analysis of enhancers by bioinformatics approaches. First, we describe a general framework for computational identification of enhancers, present relevant data types and discuss possible computational solutions. Next, we cover over 30 existing computational enhancer identification methods that were developed since 2000. Our review highlights advantages, limitations and potentials, while suggesting pragmatic guidelines for development of more efficient computational enhancer prediction methods. Finally, we discuss challenges and open problems of this topic, which require further consideration.