Abstract
Motivation
Single-cell RNA-seq makes possible the investigation of variability in gene expression among cells, and dependence of variation on cell type. Statistical inference methods for ...such analyses must be scalable, and ideally interpretable.
Results
We present an approach based on a modification of a recently published highly scalable variational autoencoder framework that provides interpretability without sacrificing much accuracy. We demonstrate that our approach enables identification of gene programs in massive datasets. Our strategy, namely the learning of factor models with the auto-encoding variational Bayes framework, is not domain specific and may be useful for other applications.
Availability and implementation
The factor model is available in the scVI package hosted at https://github.com/YosefLab/scVI/.
Contact
v@nxn.se
Supplementary information
Supplementary data are available at Bioinformatics online.
The simultaneous quantification of protein and RNA makes possible the inference of past, present, and future cell states from single experimental snapshots. To enable such temporal analysis from ...multimodal single-cell experiments, we introduce an extension of the RNA velocity method that leverages estimates of unprocessed transcript and protein abundances to extrapolate cell states. We apply the model to six datasets and demonstrate consistency among cell landscapes and phase portraits. The analysis software is available as the protaccel Python package.
Abstract
The more than 1000 single-cell transcriptomics studies that have been published to date constitute a valuable and vast resource for biological discovery. While various ‘atlas’ projects have ...collated some of the associated datasets, most questions related to specific tissue types, species or other attributes of studies require identifying papers through manual and challenging literature search. To facilitate discovery with published single-cell transcriptomics data, we have assembled a near exhaustive, manually curated database of single-cell transcriptomics studies with key information: descriptions of the type of data and technologies used, along with descriptors of the biological systems studied. Additionally, the database contains summarized information about analysis in the papers, allowing for analysis of trends in the field. As an example, we show that the number of cell types identified in scRNA-seq studies is proportional to the number of cells analysed.
Database URL: www.nxn.se/single-cell-studies/gui
The differences between individual cells can have profound functional consequences, in both unicellular and multicellular organisms. Recently developed single-cell mRNA-sequencing methods enable ...unbiased, high-throughput, and high-resolution transcriptomic analysis of individual cells. This provides an additional dimension to transcriptomic information relative to traditional methods that profile bulk populations of cells. Already, single-cell RNA-sequencing methods have revealed new biology in terms of the composition of tissues, the dynamics of transcription, and the regulatory relationships between genes. Rapid technological developments at the level of cell capture, phenotyping, molecular biology, and bioinformatics promise an exciting future with numerous biological and medical applications.
Kolodziejczyk et al. review the technical steps required for a successful single cell-RNA sequencing experiment from cell isolation through sequencing and analysis.
Single-cell RNA sequencing (scRNA-seq) has become an established and powerful method to investigate transcriptomic cell-to-cell variation, thereby revealing new cell types and providing insights into ...developmental processes and transcriptional stochasticity. A key question is how the variety of available protocols compare in terms of their ability to detect and accurately quantify gene expression. Here, we assessed the protocol sensitivity and accuracy of many published data sets, on the basis of spike-in standards and uniform data processing. For our workflow, we developed a flexible tool for counting the number of unique molecular identifiers (https://github.com/vals/umis/). We compared 15 protocols computationally and 4 protocols experimentally for batch-matched cell populations, in addition to investigating the effects of spike-in molecular degradation. Our analysis provides an integrated framework for comparing scRNA-seq protocols.
The transcriptional programs that govern hematopoiesis have been investigated primarily by population-level analysis of hematopoietic stem and progenitor cells, which cannot reveal the continuous ...nature of the differentiation process. Here we applied single-cell RNA-sequencing to a population of hematopoietic cells in zebrafish as they undergo thrombocyte lineage commitment. By reconstructing their developmental chronology computationally, we were able to place each cell along a continuum from stem cell to mature cell, refining the traditional lineage tree. The progression of cells along this continuum is characterized by a highly coordinated transcriptional program, displaying simultaneous suppression of genes involved in cell proliferation and ribosomal biogenesis as the expression of lineage specific genes increases. Within this program, there is substantial heterogeneity in the expression of the key lineage regulators. Overall, the total number of genes expressed, as well as the total mRNA content of the cell, decreases as the cells undergo lineage commitment.
Display omitted
•Single-cell RNA-sequencing reveals the continuous nature of thrombocyte development•Coordinated transcriptional programs govern progression of differentiation•Number of genes expressed and mRNA content per cell decrease during differentiation•Zebrafish thrombocytes remain transcriptionally active in circulation
Computational reconstruction of the thrombocyte’s developmental chronology from scRNA-seq data reveals the continuous nature of the differentiation process. Macaulay et al. show that a highly coordinated transcriptional program characterizes the progression of cells along this continuum. Within this program, there is substantial heterogeneity in the expression of key lineage regulators.
The endothelial to haematopoietic transition (EHT) is the process whereby haemogenic endothelium differentiates into haematopoietic stem and progenitor cells (HSPCs). The intermediary steps of this ...process are unclear, in particular the identity of endothelial cells that give rise to HSPCs is unknown. Using single-cell transcriptome analysis and antibody screening, we identify CD44 as a marker of EHT enabling us to isolate robustly the different stages of EHT in the aorta-gonad-mesonephros (AGM) region. This allows us to provide a detailed phenotypical and transcriptional profile of CD44-positive arterial endothelial cells from which HSPCs emerge. They are characterized with high expression of genes related to Notch signalling, TGFbeta/BMP antagonists, a downregulation of genes related to glycolysis and the TCA cycle, and a lower rate of cell cycle. Moreover, we demonstrate that by inhibiting the interaction between CD44 and its ligand hyaluronan, we can block EHT, identifying an additional regulator of HSPC development.
Mouse embryonic stem cells are dynamic and heterogeneous. For example, rare cells cycle through a state characterized by decondensed chromatin and expression of transcripts, including the Zscan4 ...cluster and MERVL endogenous retrovirus, which are usually restricted to preimplantation embryos. Here, we further characterize the dynamics and consequences of this transient cell state. Single-cell transcriptomics identified the earliest upregulated transcripts as cells enter the MERVL/Zscan4 state. The MERVL/Zscan4 transcriptional network was also upregulated during induced pluripotent stem cell reprogramming. Genome-wide DNA methylation and chromatin analyses revealed global DNA hypomethylation accompanying increased chromatin accessibility. This transient DNA demethylation was driven by a loss of DNA methyltransferase proteins in the cells and occurred genome-wide. While methylation levels were restored once cells exit this state, genomic imprints remained hypomethylated, demonstrating a potential global and enduring influence of endogenous retroviral activation on the epigenome.
Display omitted
•Single-cell transcriptomics reveals dynamics of MERVL/Zscan4 network activation•MERVL-LTR transcriptional network is expressed in iPSC reprogramming events•Translation block depletes Dnmt proteins, inducing transient global demethylation•Passage through the MERVL/Zscan4 state may cause irreversible imprint erasure
Mouse embryonic stem cells sporadically express preimplantation transcripts, including the MERVL endogenous retrovirus and Zscan4 cluster. Eckersley-Maslin et al. investigate the transcriptional dynamics in these cells and reveal transient genome-wide DNA demethylation accompanying chromatin decompaction. Following state exit, methylation levels are restored, except for genomic imprints, which remain lost.