Genomic datasets are often interpreted in the context of large-scale reference databases. One approach is to identify significantly overlapping gene sets, which works well for gene-centric data. ...However, many types of high-throughput data are based on genomic regions. Locus Overlap Analysis (LOLA) provides easy and automatable enrichment analysis for genomic region sets, thus facilitating the interpretation of functional genomics and epigenomics data.
R package available in Bioconductor and on the following website: http://lola.computational-epigenetics.org.
Methods for single-cell genome and transcriptome sequencing have contributed to our understanding of cellular heterogeneity, whereas methods for single-cell epigenomics are much less ...established. Here, we describe a whole-genome bisulfite sequencing (WGBS) assay that enables DNA methylation mapping in very small cell populations (μWGBS) and single cells (scWGBS). Our assay is optimized for profiling many samples at low coverage, and we describe a bioinformatic method that analyzes collections of single-cell methylomes to infer cell-state dynamics. Using these technological advances, we studied epigenomic cell-state dynamics in three in vitro models of cellular differentiation and pluripotency, where we observed characteristic patterns of epigenome remodeling and cell-to-cell heterogeneity. The described method enables single-cell analysis of DNA methylation in a broad range of biological systems, including embryonic development, stem cell differentiation, and cancer. It can also be used to establish composite methylomes that account for cell-to-cell heterogeneity in complex tissue samples.
Display omitted
•High-throughput bisulfite sequencing assay for low-input and single-cell samples•Single-cell methylomes for in vitro models (K562 AzaC, HL60 VitD, and mES 2i/ATRA/EB)•Bioinformatic method for inferring cell-state dynamics from sparse methylome data•Identification of genomic region types with consistent changes among single cells
Farlik et al. describe a method for DNA methylation sequencing in very small cell populations (μWGBS) and single cells (scWGBS). Furthermore, they present a bioinformatic method for analyzing low-coverage methylome data and apply this technique to inferring epigenomic cell-state dynamics in pluripotent and differentiating cells.
Most genome-wide assays provide averages across large numbers of cells, but recent technological advances promise to overcome this limitation. Pioneering single-cell assays are now available for ...genome, epigenome, transcriptome, proteome, and metabolome profiling. Here, we describe how these different dimensions can be combined into multi-omics assays that provide comprehensive profiles of the same cell.
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is widely used to map histone marks and transcription factor binding throughout the genome. Here we present ChIPmentation, a method ...that combines chromatin immunoprecipitation with sequencing library preparation by Tn5 transposase ('tagmentation'). ChIPmentation introduces sequencing-compatible adaptors in a single-step reaction directly on bead-bound chromatin, which reduces time, cost and input requirements, thus providing a convenient and broadly useful alternative to existing ChIP-seq protocols.
Transcription factor fusion proteins can transform cells by inducing global changes of the transcriptome, often creating a state of oncogene addiction. Here, we investigate the role of epigenetic ...mechanisms in this process, focusing on Ewing sarcoma cells that are dependent on the EWS-FLI1 fusion protein. We established reference epigenome maps comprising DNA methylation, seven histone marks, open chromatin states, and RNA levels, and we analyzed the epigenome dynamics upon downregulation of the driving oncogene. Reduced EWS-FLI1 expression led to widespread epigenetic changes in promoters, enhancers, and super-enhancers, and we identified histone H3K27 acetylation as the most strongly affected mark. Clustering of epigenetic promoter signatures defined classes of EWS-FLI1-regulated genes that responded differently to low-dose treatment with histone deacetylase inhibitors. Furthermore, we observed strong and opposing enrichment patterns for E2F and AP-1 among EWS-FLI1-correlated and anticorrelated genes. Our data describe extensive genome-wide rewiring of epigenetic cell states driven by an oncogenic fusion protein.
Display omitted
•Reference epigenome maps identify widespread epigenetic change in Ewing sarcoma cells•EWS-FLI1-regulated genes fall into clusters with characteristic chromatin signatures•Transcriptome response to HDAC inhibitors depends on promoter-specific histone marks•EWS-FLI1 induces global changes in H3K27ac and genome-wide enhancer reprogramming
EWS-FLI1 is an oncogenic fusion protein and the main driver of Ewing sarcoma. Tomazou et al. establish comprehensive epigenome maps for an EWS-FLI1-dependent cell line. Based on these data, they identify clusters of epigenetically regulated genes and a unique enhancer signature that is associated with EWS-FLI1 oncogene addiction.
Abstract
Motivation
The Gene Expression Omnibus has become an important source of biological data for secondary analysis. However, there is no simple, programmatic way to download data and metadata ...from Gene Expression Omnibus (GEO) in a standardized annotation format.
Results
To address this, we present GEOfetch—a command-line tool that downloads and organizes data and metadata from GEO and SRA. GEOfetch formats the downloaded metadata as a Portable Encapsulated Project, providing universal format for the reanalysis of public data.
Availability and implementation
GEOfetch is available on Bioconda and the Python Package Index (PyPI).
Functional genomics experiments, like ChIP-Seq or ATAC-Seq, produce results that are summarized as a region set. There is no way to objectively evaluate the effectiveness of region set similarity ...metrics. We present Bedshift, a tool for perturbing BED files by randomly shifting, adding, and dropping regions from a reference file. The perturbed files can be used to benchmark similarity metrics, as well as for other applications. We highlight differences in behavior between metrics, such as that the Jaccard score is most sensitive to added or dropped regions, while coverage score is most sensitive to shifted regions.
Complex patterns of cell-type-specific gene expression are thought to be achieved by combinatorial binding of transcription factors (TFs) to sequence elements in regulatory regions. Predicting ...cell-type-specific expression in mammals has been hindered by the oftentimes unknown location of distal regulatory regions. To alleviate this bottleneck, we used DNase-seq data from 19 diverse human cell types to identify proximal and distal regulatory elements at genome-wide scale. Matched expression data allowed us to separate genes into classes of cell-type-specific up-regulated, down-regulated, and constitutively expressed genes. CG dinucleotide content and DNA accessibility in the promoters of these three classes of genes displayed substantial differences, highlighting the importance of including these aspects in modeling gene expression. We associated DNase I hypersensitive sites (DHSs) with genes, and trained classifiers for different expression patterns. TF sequence motif matches in DHSs provided a strong performance improvement in predicting gene expression over the typical baseline approach of using proximal promoter sequences. In particular, we achieved competitive performance when discriminating up-regulated genes from different cell types or genes up- and down-regulated under the same conditions. We identified previously known and new candidate cell-type-specific regulators. The models generated testable predictions of activating or repressive functions of regulators. DNase I footprints for these regulators were indicative of their direct binding to DNA. In summary, we successfully used information of open chromatin obtained by a single assay, DNase-seq, to address the problem of predicting cell-type-specific gene expression in mammalian organisms directly from regulatory sequence.
Abstract
Summary
Databases of large-scale genome projects now contain thousands of genomic interval datasets. These data are a critical resource for understanding the function of DNA. However, our ...ability to examine and integrate interval data of this scale is limited. Here, we introduce the integrated genome database (IGD), a method and tool for searching genome interval datasets more than three orders of magnitude faster than existing approaches, while using only one hundredth of the memory. IGD uses a novel linear binning method that allows us to scale analysis to billions of genomic regions.
Availabilityand implementation
https://github.com/databio/IGD.
Supplementary information
Supplementary data are available at Bioinformatics online.