Inferring a Gene Regulatory Network (GRN) from gene expression data is a computationally expensive task, exacerbated by increasing data sizes due to advances in high-throughput gene profiling ...technology, such as single-cell RNA-seq. To equip researchers with a toolset to infer GRNs from large expression datasets, we propose GRNBoost2 and the Arboreto framework. GRNBoost2 is an efficient algorithm for regulatory network inference using gradient boosting, based on the GENIE3 architecture. Arboreto is a computational framework that scales up GRN inference algorithms complying with this architecture. Arboreto includes both GRNBoost2 and an improved implementation of GENIE3, as a user-friendly open source Python package.
Arboreto is available under the 3-Clause BSD license at http://arboreto.readthedocs.io.
Supplementary data are available at Bioinformatics online.
This protocol explains how to perform a fast SCENIC analysis alongside standard best practices steps on single-cell RNA-sequencing data using software containers and Nextflow pipelines. SCENIC ...reconstructs regulons (i.e., transcription factors and their target genes) assesses the activity of these discovered regulons in individual cells and uses these cellular activity patterns to find meaningful clusters of cells. Here we present an improved version of SCENIC with several advances. SCENIC has been refactored and reimplemented in Python (pySCENIC), resulting in a tenfold increase in speed, and has been packaged into containers for ease of use. It is now also possible to use epigenomic track databases, as well as motifs, to refine regulons. In this protocol, we explain the different steps of SCENIC: the workflow starts from the count matrix depicting the gene abundances for all cells and consists of three stages. First, coexpression modules are inferred using a regression per-target approach (GRNBoost2). Next, the indirect targets are pruned from these modules using cis-regulatory motif discovery (cisTarget). Lastly, the activity of these regulons is quantified via an enrichment score for the regulon's target genes (AUCell). Nonlinear projection methods can be used to display visual groupings of cells based on the cellular activity patterns of these regulons. The results can be exported as a loom file and visualized in the SCope web application. This protocol is illustrated on two use cases: a peripheral blood mononuclear cell data set and a panel of single-cell RNA-sequencing cancer experiments. For a data set of 10,000 genes and 50,000 cells, the pipeline runs in <2 h.
We present cisTopic, a probabilistic framework used to simultaneously discover coaccessible enhancers and stable cell states from sparse single-cell epigenomics data ( ...http://github.com/aertslab/cistopic ). Using a compendium of single-cell ATAC-seq datasets from differentiating hematopoietic cells, brain and transcription factor perturbations, we demonstrate that topic modeling can be exploited for robust identification of cell types, enhancers and relevant transcription factors. cisTopic provides insight into the mechanisms underlying regulatory heterogeneity in cell populations.
The diversity of cell types and regulatory states in the brain, and how these change during aging, remains largely unknown. We present a single-cell transcriptome atlas of the entire adult Drosophila ...melanogaster brain sampled across its lifespan. Cell clustering identified 87 initial cell clusters that are further subclustered and validated by targeted cell-sorting. Our data show high granularity and identify a wide range of cell types. Gene network analyses using SCENIC revealed regulatory heterogeneity linked to energy consumption. During aging, RNA content declines exponentially without affecting neuronal identity in old brains. This single-cell brain atlas covers nearly all cells in the normal brain and provides the tools to study cellular diversity alongside other Drosophila and mammalian single-cell datasets in our unique single-cell analysis platform: SCope (http://scope.aertslab.org). These results, together with SCope, allow comprehensive exploration of all transcriptional states of an entire aging brain.
Display omitted
•A single-cell atlas of the adult fly brain during aging•Network inference reveals regulatory states related to oxidative phosphorylation•Cell identity is retained during aging despite exponential decline of gene expression•SCope: An online tool to explore and compare single-cell datasets across species
A single-cell atlas of adult fly brains identifies the ensemble of neuronal and glial cell types and their dynamic changes during aging.
Many patients with advanced cancers achieve dramatic responses to a panoply of therapeutics yet retain minimal residual disease (MRD), which ultimately results in relapse. To gain insights into the ...biology of MRD, we applied single-cell RNA sequencing to malignant cells isolated from BRAF mutant patient-derived xenograft melanoma cohorts exposed to concurrent RAF/MEK-inhibition. We identified distinct drug-tolerant transcriptional states, varying combinations of which co-occurred within MRDs from PDXs and biopsies of patients on treatment. One of these exhibited a neural crest stem cell (NCSC) transcriptional program largely driven by the nuclear receptor RXRG. An RXR antagonist mitigated accumulation of NCSCs in MRD and delayed the development of resistance. These data identify NCSCs as key drivers of resistance and illustrate the therapeutic potential of MRD-directed therapy. They also highlight how gene regulatory network architecture reprogramming may be therapeutically exploited to limit cellular heterogeneity, a key driver of disease progression and therapy resistance.
Display omitted
•Minimal residual diseases in melanoma exhibit cellular and spatial heterogeneity•Cell-state transition contributes to co-emergence of distinct drug-tolerant states•RXR signaling drives emergence of a cell population conferring treatment resistance•Targeting RXR signaling is promising for delaying or obviating relapse in melanoma
Drug-tolerant cells that persist through treatment of melanoma exhibit multiple transcriptional states, one of which is a key driver that can be targeted therapeutically.
Abstract
Single-cell techniques are advancing rapidly and are yielding unprecedented insight into cellular heterogeneity. Mapping the gene regulatory networks (GRNs) underlying cell states provides ...attractive opportunities to mechanistically understand this heterogeneity. In this review, we discuss recently emerging methods to map GRNs from single-cell transcriptomics data, tackling the challenge of increased noise levels and data sparsity compared with bulk data, alongside increasing data volumes. Next, we discuss how new techniques for single-cell epigenomics, such as single-cell ATAC-seq and single-cell DNA methylation profiling, can be used to decipher gene regulatory programmes. We finally look forward to the application of single-cell multi-omics and perturbation techniques that will likely play important roles for GRN inference in the future.
Single‐cell technologies allow measuring chromatin accessibility and gene expression in each cell, but jointly utilizing both layers to map bona fide gene regulatory networks and enhancers remains ...challenging. Here, we generate independent single‐cell RNA‐seq and single‐cell ATAC‐seq atlases of the Drosophila eye‐antennal disc and spatially integrate the data into a virtual latent space that mimics the organization of the 2D tissue using ScoMAP (Single‐Cell Omics Mapping into spatial Axes using Pseudotime ordering). To validate spatially predicted enhancers, we use a large collection of enhancer–reporter lines and identify ~ 85% of enhancers in which chromatin accessibility and enhancer activity are coupled. Next, we infer enhancer‐to‐gene relationships in the virtual space, finding that genes are mostly regulated by multiple, often redundant, enhancers. Exploiting cell type‐specific enhancers, we deconvolute cell type‐specific effects of bulk‐derived chromatin accessibility QTLs. Finally, we discover that Prospero drives neuronal differentiation through the binding of a GGG motif. In summary, we provide a comprehensive spatial characterization of gene regulation in a 2D tissue.
Synopsis
In this study, scRNA‐seq and scATAC‐seq atlases of the Drosophila eye‐antennal disc are spatially integrated. A combination of enhancer‐reporter assays, machine learning, caQTL analysis and genetic perturbations identifies core regulatory mechanisms.
A virtual map is created to spatially integrate independent single‐cell RNA‐seq and single‐cell ATAC‐seq data from the Drosophila eye‐antennal disc.
Spatial comparison of chromatin accessibility and enhancer activity reveals that accessibility and activity are coupled for ˜ 85% of the accessible regions.
Enhancer‐to‐gene links inferred from the spatial map suggest that genes are mostly regulated by multiple, often redundant, enhancers.
Single‐cell omics, cell‐type specific caQTL analysis and perturbation experiments show that an enriched GGG motif in photoreceptors enhancers is bound by the transcription factor Prospero.
In this study, scRNA‐seq and scATAC‐seq atlases of the Drosophila eye‐antennal disc are spatially integrated. A combination of enhancer‐reporter assays, machine learning, caQTL analysis and genetic perturbations identifies core regulatory mechanisms.
Identification and functional validation of oncogenic drivers are essential steps toward advancing cancer precision medicine. Here, we have presented a comprehensive analysis of the somatic genomic ...landscape of the widely used BRAFV600E- and NRASQ61K-driven mouse models of melanoma. By integrating the data with publically available genomic, epigenomic, and transcriptomic information from human clinical samples, we confirmed the importance of several genes and pathways previously implicated in human melanoma, including the tumor-suppressor genes phosphatase and tensin homolog (PTEN), cyclin dependent kinase inhibitor 2A (CDKN2A), LKB1, and others. Importantly, this approach also identified additional putative melanoma drivers with prognostic and therapeutic relevance. Surprisingly, one of these genes encodes the tyrosine kinase FES. Whereas FES is highly expressed in normal human melanocytes, FES expression is strongly decreased in over 30% of human melanomas. This downregulation correlates with poor overall survival. Correspondingly, engineered deletion of Fes accelerated tumor progression in a BRAFV600E-driven mouse model of melanoma. Together, these data implicate FES as a driver of melanoma progression and demonstrate the potential of cross-species oncogenomic approaches combined with mouse modeling to uncover impactful mutations and oncogenic driver alleles with clinical importance in the treatment of human cancer.
In the study of complex diseases using genome-wide expression data from clinical samples, a difficult case is the identification and mapping of the gene signatures associated to the stages that occur ...in the progression of a disease. The stages usually correspond to different subtypes or classes of the disease, and the difficulty to identify them often comes from patient heterogeneity and sample variability that can hide the biomedical relevant changes that characterize each stage, making standard differential analysis inadequate or inefficient.
We propose a methodology to study diseases or disease stages ordered in a sequential manner (e.g. from early stages with good prognosis to more acute or serious stages associated to poor prognosis). The methodology is applied to diseases that have been studied obtaining genome-wide expression profiling of cohorts of patients at different stages. The approach allows searching for consistent expression patterns along the progression of the disease through two major steps: (i) identifying genes with increasing or decreasing trends in the progression of the disease; (ii) clustering the increasing/decreasing gene expression patterns using an unsupervised approach to reveal whether there are consistent patterns and find genes altered at specific disease stages. The first step is carried out using Gamma rank correlation to identify genes whose expression correlates with a categorical variable that represents the stages of the disease. The second step is done using a Self Organizing Map (SOM) to cluster the genes according to their progressive profiles and identify specific patterns. Both steps are done after normalization of the genomic data to allow the integration of multiple independent datasets. In order to validate the results and evaluate their consistency and biological relevance, the methodology is applied to datasets of three different diseases: myelodysplastic syndrome, colorectal cancer and Alzheimer's disease. A software script written in R, named genediseasePatterns, is provided to allow the use and application of the methodology.
The method presented allows the analysis of the progression of complex and heterogeneous diseases that can be divided in pathological stages. It identifies gene groups whose expression patterns change along the advance of the disease, and it can be applied to different types of genomic data studying cohorts of patients in different states.