A key task in single-cell RNA-seq (scRNA-seq) data analysis is to accurately detect the number of cell types in the sample, which can be critical for downstream analyses such as cell type ...identification. Various scRNA-seq data clustering algorithms have been specifically designed to automatically estimate the number of cell types through optimising the number of clusters in a dataset. The lack of benchmark studies, however, complicates the choice of the methods.
We systematically benchmark a range of popular clustering algorithms on estimating the number of cell types in a variety of settings by sampling from the Tabula Muris data to create scRNA-seq datasets with a varying number of cell types, varying number of cells in each cell type, and different cell type proportions. The large number of datasets enables us to assess the performance of the algorithms, covering four broad categories of approaches, from various aspects using a panel of criteria. We further cross-compared the performance on datasets with high cell numbers using Tabula Muris and Tabula Sapiens data.
We identify the strengths and weaknesses of each method on multiple criteria including the deviation of estimation from the true number of cell types, variability of estimation, clustering concordance of cells to their predefined cell types, and running time and peak memory usage. We then summarise these results into a multi-aspect recommendation to the users. The proposed stability-based approach for estimating the number of cell types is implemented in an R package and is freely available from ( https://github.com/PYangLab/scCCESS ).
A novel approach to probabilistically align adjacent multiple tissue slices from spatially resolved transcriptomics data provides unprecedented depth for the investigation of tissue architecture and ...paves the way for new developments in 3D spatial analytics.
Single-cell multiomics data continues to grow at an unprecedented pace. Although several methods have demonstrated promising results in integrating several data modalities from the same tissue, the ...complexity and scale of data compositions present in cell atlases still pose a challenge. Here, we present scJoint, a transfer learning method to integrate atlas-scale, heterogeneous collections of scRNA-seq and scATAC-seq data. scJoint leverages information from annotated scRNA-seq data in a semisupervised framework and uses a neural network to simultaneously train labeled and unlabeled data, allowing label transfer and joint visualization in an integrative framework. Using atlas data as well as multimodal datasets generated with ASAP-seq and CITE-seq, we demonstrate that scJoint is computationally efficient and consistently achieves substantially higher cell-type label accuracy than existing methods while providing meaningful joint visualizations. Thus, scJoint overcomes the heterogeneity of different data modalities to enable a more comprehensive understanding of cellular phenotypes.
A major challenge of the post-genomics era is to define the connectivity of protein phosphorylation networks. Here, we quantitatively delineate the insulin signaling network in adipocytes by ...high-resolution mass spectrometry-based proteomics. These data reveal the complexity of intracellular protein phosphorylation. We identified 37,248 phosphorylation sites on 5,705 proteins in this single-cell type, with approximately 15% responding to insulin. We integrated these large-scale phosphoproteomics data using a machine learning approach to predict physiological substrates of several diverse insulin-regulated kinases. This led to the identification of an Akt substrate, SIN1, a core component of the mTORC2 complex. The phosphorylation of SIN1 by Akt was found to regulate mTORC2 activity in response to growth factors, revealing topological insights into the Akt/mTOR signaling network. The dynamic phosphoproteome described here contains numerous phosphorylation sites on proteins involved in diverse molecular functions and should serve as a useful functional resource for cell biologists.
Display omitted
•MS/MS identified >37,000 phosphorylation sites in adipocytes•Insulin regulates the phosphoproteome over a wide temporal timescale•Akt phosphorylates SIN1 on T86 in response to insulin•SIN1 phosphorylation activates a positive feedback loop between Akt and mTORC2
The recent emergence of multi-sample multi-condition single-cell multi-cohort studies allows researchers to investigate different cell states. The effective integration of multiple large-cohort ...studies promises biological insights into cells under different conditions that individual studies cannot provide. Here, we present scMerge2, a scalable algorithm that allows data integration of atlas-scale multi-sample multi-condition single-cell studies. We have generalized scMerge2 to enable the merging of millions of cells from single-cell studies generated by various single-cell technologies. Using a large COVID-19 data collection with over five million cells from 1000+ individuals, we demonstrate that scMerge2 enables multi-sample multi-condition scRNA-seq data integration from multiple cohorts and reveals signatures derived from cell-type expression that are more accurate in discriminating disease progression. Further, we demonstrate that scMerge2 can remove dataset variability in CyTOF, imaging mass cytometry and CITE-seq experiments, demonstrating its applicability to a broad spectrum of single-cell profiling technologies.
Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data ...analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification.
Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used.
Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESS.
COVID-19 patients display a wide range of disease severity, ranging from asymptomatic to critical symptoms with high mortality risk. Our ability to understand the interaction of SARS-CoV-2 infected ...cells within the lung, and of protective or dysfunctional immune responses to the virus, is critical to effectively treat these patients. Currently, our understanding of cell-cell interactions across different disease states, and how such interactions may drive pathogenic outcomes, is incomplete. Here, we developed a generalizable and scalable workflow for identifying cells that are differentially interacting across COVID-19 patients with distinct disease outcomes and use this to examine eight public single-cell RNA-seq datasets (six from peripheral blood mononuclear cells, one from bronchoalveolar lavage and one from nasopharyngeal), with a total of 211 individual samples. By characterizing the cell-cell interaction patterns across epithelial and immune cells in lung tissues for patients with varying disease severity, we illustrate diverse communication patterns across individuals, and discover heterogeneous communication patterns among moderate and severe patients. We further illustrate patterns derived from cell-cell interactions are potential signatures for discriminating between moderate and severe patients. Overall, this workflow can be generalized and scaled to combine multiple scRNA-seq datasets to uncover cell-cell interactions.
Recent advances in subcellular imaging transcriptomics platforms have enabled high-resolution spatial mapping of gene expression, while also introducing significant analytical challenges in ...accurately identifying cells and assigning transcripts. Existing methods grapple with cell segmentation, frequently leading to fragmented cells or oversized cells that capture contaminated expression. To this end, we present BIDCell, a self-supervised deep learning-based framework with biologically-informed loss functions that learn relationships between spatially resolved gene expression and cell morphology. BIDCell incorporates cell-type data, including single-cell transcriptomics data from public repositories, with cell morphology information. Using a comprehensive evaluation framework consisting of metrics in five complementary categories for cell segmentation performance, we demonstrate that BIDCell outperforms other state-of-the-art methods according to many metrics across a variety of tissue types and technology platforms. Our findings underscore the potential of BIDCell to significantly enhance single-cell spatial expression analyses, enabling great potential in biological discovery.
Background
Microbiome feedbacks are proposed to influence Parkinson’s disease (PD) pathophysiology. A number of studies have evaluated the impact of oral medication on the gut microbiome (GM) in PD. ...However, the influence of PD device-assisted therapies (DATs) on the GM remains to be investigated.
Objectives
To profile acute gut microbial community alterations in response to PD DAT initiation.
Methods
Clinical data and stool samples were collected from 21 PD patients initiating either deep brain stimulation (DBS) or levodopa–carbidopa intestinal gel (LCIG) and ten spousal healthy control (HC) subjects. 16S amplicon sequencing of stool DNA enabled comparison of temporal GM stability between groups and with clinical measures, including disease alterations relative to therapy initiation.
Results
We assessed GM response to therapy in the PD group by comparing pre-therapy (− 2 and 0 weeks) with post-therapy initiation timepoints (+ 2 and + 4 weeks) and HCs at baseline (0 weeks). Altered GM compositions were noted between the PD and HC groups at various taxonomic levels, including specific differences for DBS (overrepresentation of
Clostridium_XlVa, Bilophila, Parabacteroides
,
Pseudoflavonifractor
and underrepresentation of
Dorea
) and LCIG therapy (overrepresentation of
Pseudoflavonifractor, Escherichia/Shigella
, and underrepresentation of
Gemmiger
). Beta diversity changes were also found over the 4 week post-treatment initiation period.
Conclusions
We report on initial short-term GM changes in response to the initiation of PD DATs. Prior to the introduction of the DAT, a PD-associated GM was observed. Following initiation of DAT, several DAT-specific changes in GM composition were identified, suggesting DATs can influence the GM in PD.
The differentiation and maturation trajectories of fetal liver stem/progenitor cells (LSPCs) are not fully understood at single-cell resolution, and a priori knowledge of limited biomarkers could ...restrict trajectory tracking.
We employed marker-free single-cell RNA-Seq to characterize comprehensive transcriptional profiles of 507 cells randomly selected from seven stages between embryonic day 11.5 and postnatal day 2.5 during mouse liver development, and also 52 Epcam-positive cholangiocytes from postnatal day 3.25 mouse livers. LSPCs in developing mouse livers were identified via marker-free transcriptomic profiling. Single-cell resolution dynamic developmental trajectories of LSPCs exhibited contiguous but discrete genetic control through transcription factors and signaling pathways. The gene expression profiles of cholangiocytes were more close to that of embryonic day 11.5 rather than other later staged LSPCs, cuing the fate decision stage of LSPCs. Our marker-free approach also allows systematic assessment and prediction of isolation biomarkers for LSPCs.
Our data provide not only a valuable resource but also novel insights into the fate decision and transcriptional control of self-renewal, differentiation and maturation of LSPCs.