annotatr: genomic regions in context Cavalcante, Raymond G; Sartor, Maureen A
Bioinformatics (Oxford, England),
08/2017, Volume:
33, Issue:
15
Journal Article
Peer reviewed
Open access
Analysis of next-generation sequencing data often results in a list of genomic regions. These may include differentially methylated CpGs/regions, transcription factor binding sites, interacting ...chromatin regions, or GWAS-associated SNPs, among others. A common analysis step is to annotate such genomic regions to genomic annotations (promoters, exons, enhancers, etc.). Existing tools are limited by a lack of annotation sources and flexible options, the time it takes to annotate regions, an artificial one-to-one region-to-annotation mapping, a lack of visualization options to easily summarize data, or some combination thereof.
We developed the annotatr Bioconductor package to flexibly and quickly summarize and plot annotations of genomic regions. The annotatr package reports all intersections of regions and annotations, giving a better understanding of the genomic context of the regions. A variety of graphics functions are implemented to easily plot numerical or categorical data associated with the regions across the annotations, and across annotation intersections, providing insight into how characteristics of the regions differ across the annotations. We demonstrate that annotatr is up to 27× faster than comparable R packages. Overall, annotatr enables a richer biological interpretation of experiments.
http://bioconductor.org/packages/annotatr/ and https://github.com/rcavalcante/annotatr.
rcavalca@umich.edu.
Supplementary data are available at Bioinformatics online.
Abstract
The task of predicting the interactions between drugs and targets plays a key role in the process of drug discovery. There is a need to develop novel and efficient prediction approaches in ...order to avoid costly and laborious yet not-always-deterministic experiments to determine drug–target interactions (DTIs) by experiments alone. These approaches should be capable of identifying the potential DTIs in a timely manner. In this article, we describe the data required for the task of DTI prediction followed by a comprehensive catalog consisting of machine learning methods and databases, which have been proposed and utilized to predict DTIs. The advantages and disadvantages of each set of methods are also briefly discussed. Lastly, the challenges one may face in prediction of DTI using machine learning approaches are highlighted and we conclude by shedding some lights on important future research directions.
Appreciation of the role of the gut microbiome in regulating vertebrate metabolism has exploded recently. However, the effects of gut microbiota on skeletal growth and homeostasis have only recently ...begun to be explored. Here, we report that colonization of sexually mature germ-free (GF) mice with conventional specific pathogen-free (SPF) gut microbiota increases both bone formation and resorption, with the net effect of colonization varying with the duration of colonization. Although colonization of adult mice acutely reduces bone mass, in long-term colonized mice, an increase in bone formation and growth plate activity predominates, resulting in equalization of bone mass and increased longitudinal and radial bone growth. Serum levels of insulin-like growth factor 1 (IGF-1), a hormone with known actions on skeletal growth, are substantially increased in response to microbial colonization, with significant increases in liver and adipose tissue IGF-1 production. Antibiotic treatment of conventional mice, in contrast, decreases serum IGF-1 and inhibits bone formation. Supplementation of antibiotic-treated mice with short-chain fatty acids (SCFAs), products of microbial metabolism, restores IGF-1 and bone mass to levels seen in nonantibiotic-treated mice. Thus, SCFA production may be one mechanism by which microbiota increase serum IGF-1. Our study demonstrates that gut microbiota provide a net anabolic stimulus to the skeleton, which is likely mediated by IGF-1. Manipulation of the microbiome or its metabolites may afford opportunities to optimize bone health and growth.
Protein misfolding in the endoplasmic reticulum (ER) leads to cell death through PERK-mediated phosphorylation of eIF2α, although the mechanism is not understood. ChIP-seq and mRNA-seq of activating ...transcription factor 4 (ATF4) and C/EBP homologous protein (CHOP), key transcription factors downstream of p-eIF2α, demonstrated that they interact to directly induce genes encoding protein synthesis and the unfolded protein response, but not apoptosis. Forced expression of ATF4 and CHOP increased protein synthesis and caused ATP depletion, oxidative stress and cell death. The increased protein synthesis and oxidative stress were necessary signals for cell death. We show that eIF2α-phosphorylation-attenuated protein synthesis, and not Atf4 mRNA translation, promotes cell survival. These results show that transcriptional induction through ATF4 and CHOP increases protein synthesis leading to oxidative stress and cell death. The findings suggest that limiting protein synthesis will be therapeutic for diseases caused by protein misfolding in the ER.
Tests for differential gene expression with RNA-seq data have a tendency to identify certain types of transcripts as significant, e.g. longer and highly-expressed transcripts. This tendency has been ...shown to bias gene set enrichment (GSE) testing, which is used to find over- or under-represented biological functions in the data. Yet, there remains a surprising lack of tools for GSE testing specific for RNA-seq. We present a new GSE method for RNA-seq data, RNA-Enrich, that accounts for the above tendency empirically by adjusting for average read count per gene. RNA-Enrich is a quick, flexible method and web-based tool, with 16 available gene annotation databases. It does not require a P-value cut-off to define differential expression, and works well even with small sample-sized experiments. We show that adjusting for read counts per gene improves both the type I error rate and detection power of the test.
RNA-Enrich is available at http://lrpath.ncibi.org or from supplemental material as R code.
sartorma@umich.edu
Supplementary data are available at Bioinformatics online.
Mutations in a number of chromatin modifiers are associated with human neurological disorders. KDM5C, a histone H3 lysine 4 di- and tri-methyl (H3K4me2/3)-specific demethylase, is frequently mutated ...in X-linked intellectual disability (XLID) patients. Here, we report that disruption of the mouse Kdm5c gene recapitulates adaptive and cognitive abnormalities observed in XLID, including impaired social behavior, memory deficits, and aggression. Kdm5c-knockout brains exhibit abnormal dendritic arborization, spine anomalies, and altered transcriptomes. In neurons, Kdm5c is recruited to promoters that harbor CpG islands decorated with high levels of H3K4me3, where it fine-tunes H3K4me3 levels. Kdm5c predominantly represses these genes, which include members of key pathways that regulate the development and function of neuronal circuitries. In summary, our mouse behavioral data strongly suggest that KDM5C mutations are causal to XLID. Furthermore, our findings suggest that loss of KDM5C function may impact gene expression in multiple regulatory pathways relevant to the clinical phenotypes.
Display omitted
•Behavior of Kdm5c-knockout mice recapitulates KDM5C-linked intellectual disability•Kdm5c is required for normal dendritic branching and spine morphology in vivo•Kdm5c acts as a repressor through reducing H3K4me3 levels at CpG promoters
In this study, Iwase et al. characterize Kdm5c-knockout mice to model an important class of intellectual disability. Kdm5c-knockout mice show limited learning, heightened aggression, and dendritic spine defects. Kdm5c is a histone demethylase, and the authors identify altered transcriptional profiles in Kdm5c-knockout brains and investigate the molecular changes in neurons.
The second-generation antipsychotic olanzapine is effective in reducing psychotic symptoms but can cause extreme weight gain in human patients. We investigated the role of the gut microbiota in this ...adverse drug effect using a mouse model. First, we used germ-free C57BL/6J mice to demonstrate that gut bacteria are necessary and sufficient for weight gain caused by oral delivery of olanzapine. Second, we surveyed fecal microbiota before, during, and after treatment and found that olanzapine potentiated a shift towards an "obesogenic" bacterial profile. Finally, we demonstrated that olanzapine has antimicrobial activity in vitro against resident enteric bacterial strains. These results collectively provide strong evidence for a mechanism underlying olanzapine-induced weight gain in mouse and a hypothesis for clinical translation in human patients.
ChIP-Seq is the standard method to identify genome-wide DNA-binding sites for transcription factors (TFs) and histone modifications. There is a growing need to analyze experiments with biological ...replicates, especially for epigenomic experiments where variation among biological samples can be substantial. However, tools that can perform group comparisons are currently lacking.
We present a peak-calling prioritization pipeline (PePr) for identifying consistent or differential binding sites in ChIP-Seq experiments with biological replicates. PePr models read counts across the genome among biological samples with a negative binomial distribution and uses a local variance estimation method, ranking consistent or differential binding sites more favorably than sites with greater variability. We compared PePr with commonly used and recently proposed approaches on eight TF datasets and show that PePr uniquely identifies consistent regions with enriched read counts, high motif occurrence rate and known characteristics of TF binding based on visual inspection. For histone modification data with broadly enriched regions, PePr identified differential regions that are consistent within groups and outperformed other methods in scaling False Discovery Rate (FDR) analysis.
http://code.google.com/p/pepr-chip-seq/.
Motivation: Metabolomics is a rapidly evolving field that holds promise to provide insights into genotype-phenotype relationships in cancers, diabetes and other complex diseases. One of the major ...informatics challenges is providing tools that link metabolite data with other types of high-throughput molecular data (e.g. transcriptomics, proteomics), and incorporate prior knowledge of pathways and molecular interactions.
Results: We describe a new, substantially redesigned version of our tool Metscape that allows users to enter experimental data for metabolites, genes and pathways and display them in the context of relevant metabolic networks. Metscape 2 uses an internal relational database that integrates data from KEGG and EHMN databases. The new version of the tool allows users to identify enriched pathways from expression profiling data, build and analyze the networks of genes and metabolites, and visualize changes in the gene/metabolite data. We demonstrate the applications of Metscape to annotate molecular pathways for human and mouse metabolites implicated in the pathogenesis of sepsis-induced acute lung injury, for the analysis of gene expression and metabolite data from pancreatic ductal adenocarcinoma, and for identification of the candidate metabolites involved in cancer and inflammation.
Availability: Metscape is part of the National Institutes of Health-supported National Center for Integrative Biomedical Informatics (NCIBI) suite of tools, freely available at http://metscape.ncibi.org. It can be downloaded from http://cytoscape.org or installed via Cytoscape plugin manager.
Contact:
metscape-help@umich.edu; akarnovs@umich.edu
Supplementary information:
Supplementary data are available at Bioinformatics online.
The small sample sizes often used for microarray experiments result in poor estimates of variance if each gene is considered independently. Yet accurately estimating variability of gene expression ...measurements in microarray experiments is essential for correctly identifying differentially expressed genes. Several recently developed methods for testing differential expression of genes utilize hierarchical Bayesian models to "pool" information from multiple genes. We have developed a statistical testing procedure that further improves upon current methods by incorporating the well-documented relationship between the absolute gene expression level and the variance of gene expression measurements into the general empirical Bayes framework.
We present a novel Bayesian moderated-T, which we show to perform favorably in simulations, with two real, dual-channel microarray experiments and in two controlled single-channel experiments. In simulations, the new method achieved greater power while correctly estimating the true proportion of false positives, and in the analysis of two publicly-available "spike-in" experiments, the new method performed favorably compared to all tested alternatives. We also applied our method to two experimental datasets and discuss the additional biological insights as revealed by our method in contrast to the others. The R-source code for implementing our algorithm is freely available at http://eh3.uc.edu/ibmt.
We use a Bayesian hierarchical normal model to define a novel Intensity-Based Moderated T-statistic (IBMT). The method is completely data-dependent using empirical Bayes philosophy to estimate hyperparameters, and thus does not require specification of any free parameters. IBMT has the strength of balancing two important factors in the analysis of microarray data: the degree of independence of variances relative to the degree of identity (i.e. t-tests vs. equal variance assumption), and the relationship between variance and signal intensity. When this variance-intensity relationship is weak or does not exist, IBMT reduces to a previously described moderated t-statistic. Furthermore, our method may be directly applied to any array platform and experimental design. Together, these properties show IBMT to be a valuable option in the analysis of virtually any microarray experiment.