Single-cell technologies have recently gained popularity in cellular differentiation studies regarding their ability to resolve potential heterogeneities in cell populations. Analyzing such ...high-dimensional single-cell data has its own statistical and computational challenges. Popular multivariate approaches are based on data normalization, followed by dimension reduction and clustering to identify subgroups. However, in the case of cellular differentiation, we would not expect clear clusters to be present but instead expect the cells to follow continuous branching lineages.
Here, we propose the use of diffusion maps to deal with the problem of defining differentiation trajectories. We adapt this method to single-cell data by adequate choice of kernel width and inclusion of uncertainties or missing measurement values, which enables the establishment of a pseudotemporal ordering of single cells in a high-dimensional gene expression space. We expect this output to reflect cell differentiation trajectories, where the data originates from intrinsic diffusion-like dynamics. Starting from a pluripotent stage, cells move smoothly within the transcriptional landscape towards more differentiated states with some stochasticity along their path. We demonstrate the robustness of our method with respect to extrinsic noise (e.g. measurement noise) and sampling density heterogeneities on simulated toy data as well as two single-cell quantitative polymerase chain reaction datasets (i.e. mouse haematopoietic stem cells and mouse embryonic stem cells) and an RNA-Seq data of human pre-implantation embryos. We show that diffusion maps perform considerably better than Principal Component Analysis and are advantageous over other techniques for non-linear dimension reduction such as t-distributed Stochastic Neighbour Embedding for preserving the global structures and pseudotemporal ordering of cells.
The Matlab implementation of diffusion maps for single-cell data is available at https://www.helmholtz-muenchen.de/icb/single-cell-diffusion-map.
fbuettner.phys@gmail.com, fabian.theis@helmholtz-muenchen.de
Supplementary data are available at Bioinformatics online.
Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and ...interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.
: Diffusion maps are a spectral method for non-linear dimension reduction and have recently been adapted for the visualization of single-cell expression data. Here we present destiny, an efficient R ...implementation of the diffusion map algorithm. Our package includes a single-cell specific noise model allowing for missing and censored values. In contrast to previous implementations, we further present an efficient nearest-neighbour approximation that allows for the processing of hundreds of thousands of cells and a functionality for projecting new data on existing diffusion maps. We exemplarily apply destiny to a recent time-resolved mass cytometry dataset of cellular reprogramming.
destiny is an open-source R/Bioconductor package "bioconductor.org/packages/destiny" also available at www.helmholtz-muenchen.de/icb/destiny A detailed vignette describing functions and workflows is provided with the package.
carsten.marr@helmholtz-muenchen.de or f.buettner@helmholtz-muenchen.de
Supplementary data are available at Bioinformatics online.
High-dimensional single-cell snapshot data are becoming widespread in the systems biology community, as a mean to understand biological processes at the cellular level. However, as temporal ...information is lost with such data, mathematical models have been limited to capture only static features of the underlying cellular mechanisms.
Here, we present a modular framework which allows to recover the temporal behaviour from single-cell snapshot data and reverse engineer the dynamics of gene expression. The framework combines a dimensionality reduction method with a cell time-ordering algorithm to generate pseudo time-series observations. These are in turn used to learn transcriptional ODE models and do model selection on structural network features. We apply it on synthetic data and then on real hematopoietic stem cells data, to reconstruct gene expression dynamics during differentiation pathways and infer the structure of a key gene regulatory network.
C++ and Matlab code available at https://www.helmholtz-muenchen.de/fileadmin/ICB/software/inferenceSnapshot.zip.
Towards reliable quantification of cell state velocities Marot-Lassauzaie, Valérie; Bouman, Brigitte Joanne; Donaghy, Fearghal Declan ...
PLOS computational biology/PLoS computational biology,
09/2022, Letnik:
18, Številka:
9
Journal Article
Recenzirano
Odprti dostop
A few years ago, it was proposed to use the simultaneous quantification of unspliced and spliced messenger RNA (mRNA) to add a temporal dimension to high-throughput snapshots of single cell RNA ...sequencing data. This concept can yield additional insight into the transcriptional dynamics of the biological systems under study. However, current methods for inferring cell state velocities from such data (known as RNA velocities) are afflicted by several theoretical and computational problems, hindering realistic and reliable velocity estimation. We discuss these issues and propose new solutions for addressing some of the current challenges in consistency of data processing, velocity inference and visualisation. We translate our computational conclusion in two velocity analysis tools: one detailed method κ-velo and one heuristic method eco-velo, each of which uses a different set of assumptions about the data.
Reconstruction of the molecular pathways controlling organ development has been hampered by a lack of methods to resolve embryonic progenitor cells. Here we describe a strategy to address this ...problem that combines gene expression profiling of large numbers of single cells with data analysis based on diffusion maps for dimensionality reduction and network synthesis from state transition graphs. Applying the approach to hematopoietic development in the mouse embryo, we map the progression of mesoderm toward blood using single-cell gene expression analysis of 3,934 cells with blood-forming potential captured at four time points between E7.0 and E8.5. Transitions between individual cellular states are then used as input to develop a single-cell network synthesis toolkit to generate a computationally executable transcriptional regulatory network model of blood development. Several model predictions concerning the roles of Sox and Hox factors are validated experimentally. Our results demonstrate that single-cell analysis of a developing organ coupled with computational approaches can reveal the transcriptional programs that underpin organogenesis.
The concept of cell fate relates to the future identity of a cell, and its daughters, which is obtained via cell differentiation and division. Understanding, predicting, and manipulating cell fate ...has been a long-sought goal of developmental and regenerative biology. Recent insights obtained from single-cell genomic and integrative lineage-tracing approaches have further aided to identify molecular features predictive of cell fate. In this perspective, we discuss these approaches with a focus on theoretical concepts and future directions of the field to dissect molecular mechanisms underlying cell fate.
In this perspective, Laleh Haghverdi and Leif Ludwig discuss recent advances in cell fate research with a focus on the roles of single-cell multi-omics and lineage tracing. They further discuss mathematical and theoretical concepts and future perspectives of this field to dissect cellular mechanisms underlying cell fate.
The transfer of cell type labels from pre-annotated (reference) to newly collected data is an important task in single-cell data analysis. As the number of publicly available annotated datasets which ...can be used as reference, as well as the number of computational methods for cell type label transfer are constantly growing, rationals to understand and decide which reference design and which method to use for a particular query dataset are needed. Using detailed data visualisations and interpretable statistical assessments, we benchmark a set of popular cell type annotation methods, test their performance on different cell types and study the effects of the design of reference data (e.g., cell sampling criteria, inclusion of multiple datasets in one reference, gene set selection) on the reliability of predictions. Our results highlight the need for further improvements in label transfer methods, as well as preparation of high-quality pre-annotated reference data of adequate sampling from all cell types of interest, for more reliable annotation of new datasets.
Background Acute kidney injury (AKI) occurs frequently in critically ill patients and is associated with adverse outcomes. Cellular mechanisms underlying AKI and kidney cell responses to injury ...remain incompletely understood. Methods We performed single-nuclei transcriptomics, bulk transcriptomics, molecular imaging studies, and conventional histology on kidney tissues from 8 individuals with severe AKI (stage 2 or 3 according to Kidney Disease: Improving Global Outcomes (KDIGO) criteria). Specimens were obtained within 1-2 h after individuals had succumbed to critical illness associated with respiratory infections, with 4 of 8 individuals diagnosed with COVID-19. Control kidney tissues were obtained post-mortem or after nephrectomy from individuals without AKI. Results High-depth single cell-resolved gene expression data of human kidneys affected by AKI revealed enrichment of novel injury-associated cell states within the major cell types of the tubular epithelium, in particular in proximal tubules, thick ascending limbs, and distal convoluted tubules. Four distinct, hierarchically interconnected injured cell states were distinguishable and characterized by transcriptome patterns associated with oxidative stress, hypoxia, interferon response, and epithelial-to-mesenchymal transition, respectively. Transcriptome differences between individuals with AKI were driven primarily by the cell type-specific abundance of these four injury subtypes rather than by private molecular responses. AKI-associated changes in gene expression between individuals with and without COVID-19 were similar. Conclusions The study provides an extensive resource of the cell type-specific transcriptomic responses associated with critical illness-associated AKI in humans, highlighting recurrent disease-associated signatures and inter-individual heterogeneity. Personalized molecular disease assessment in human AKI may foster the development of tailored therapies. Keywords: Acute kidney injury, Critical illness, Single-cell sequencing
Abstract Summary One of the first steps in single-cell omics data analysis is visualization, which allows researchers to see how well-separated cell-types are from each other. When visualizing ...multiple datasets at once, data integration/batch correction methods are used to merge the datasets. While needed for downstream analyses, these methods modify features space (e.g. gene expression)/PCA space in order to mix cell-types between batches as well as possible. This obscures sample-specific features and breaks down local embedding structures that can be seen when a sample is embedded alone. Therefore, in order to improve in visual comparisons between large numbers of samples (e.g. multiple patients, omic modalities, different time points), we introduce Compound-SNE, which performs what we term a soft alignment of samples in embedding space. We show that Compound-SNE is able to align cell-types in embedding space across samples, while preserving local embedding structures from when samples are embedded independently. Availability and implementation Python code for Compound-SNE is available for download at https://github.com/HaghverdiLab/Compound-SNE.