Single-cell mass cytometry significantly increases the dimensionality of cytometry analysis as compared to fluorescence flow cytometry, providing unprecedented resolution of cellular diversity in ...tissues. However, analysis and interpretation of these high-dimensional data poses a significant technical challenge. Here, we present cytofkit, a new Bioconductor package, which integrates both state-of-the-art bioinformatics methods and in-house novel algorithms to offer a comprehensive toolset for mass cytometry data analysis. Cytofkit provides functions for data pre-processing, data visualization through linear or non-linear dimensionality reduction, automatic identification of cell subsets, and inference of the relatedness between cell subsets. This pipeline also provides a graphical user interface (GUI) for ease of use, as well as a shiny application (APP) for interactive visualization of cell subpopulations and progression profiles of key markers. Applied to a CD14-CD19- PBMCs dataset, cytofkit accurately identified different subsets of lymphocytes; applied to a human CD4+ T cell dataset, cytofkit uncovered multiple subtypes of TFH cells spanning blood and tonsils. Cytofkit is implemented in R, licensed under the Artistic license 2.0, and freely available from the Bioconductor website, https://bioconductor.org/packages/cytofkit/. Cytofkit is also applicable for flow cytometry data analysis.
Abstract
Spatial transcriptomics has been emerging as a powerful technique for resolving gene expression profiles while retaining tissue spatial information. These spatially resolved transcriptomics ...make it feasible to examine the complex multicellular systems of different microenvironments. To answer scientific questions with spatial transcriptomics and expand our understanding of how cell types and states are regulated by microenvironment, the first step is to identify cell clusters by integrating the available spatial information. Here, we introduce SC-MEB, an empirical Bayes approach for spatial clustering analysis using a hidden Markov random field. We have also derived an efficient expectation-maximization algorithm based on an iterative conditional mode for SC-MEB. In contrast to BayesSpace, a recently developed method, SC-MEB is not only computationally efficient and scalable to large sample sizes but is also capable of choosing the smoothness parameter and the number of clusters. We performed comprehensive simulation studies to demonstrate the superiority of SC-MEB over some existing methods. We applied SC-MEB to analyze the spatial transcriptome of human dorsolateral prefrontal cortex tissues and mouse hypothalamic preoptic region. Our analysis results showed that SC-MEB can achieve a similar or better clustering performance to BayesSpace, which uses the true number of clusters and a fixed smoothness parameter. Moreover, SC-MEB is scalable to large ‘sample sizes’. We then employed SC-MEB to analyze a colon dataset from a patient with colorectal cancer (CRC) and COVID-19, and further performed differential expression analysis to identify signature genes related to the clustering results. The heatmap of identified signature genes showed that the clusters identified using SC-MEB were more separable than those obtained with BayesSpace. Using pathway analysis, we identified three immune-related clusters, and in a further comparison, found the mean expression of COVID-19 signature genes was greater in immune than non-immune regions of colon tissue. SC-MEB provides a valuable computational tool for investigating the structural organizations of tissues from spatial transcriptomic data.
Histopathologic assessment is indispensable for diagnosing colorectal cancer (CRC). However, manual evaluation of the diseased tissues under the microscope cannot reliably inform patient prognosis or ...genomic variations crucial for treatment selections. To address these challenges, we develop the Multi-omics Multi-cohort Assessment (MOMA) platform, an explainable machine learning approach, to systematically identify and interpret the relationship between patients' histologic patterns, multi-omics, and clinical profiles in three large patient cohorts (n = 1888). MOMA successfully predicts the overall survival, disease-free survival (log-rank test P-value<0.05), and copy number alterations of CRC patients. In addition, our approaches identify interpretable pathology patterns predictive of gene expression profiles, microsatellite instability status, and clinically actionable genetic alterations. We show that MOMA models are generalizable to multiple patient populations with different demographic compositions and pathology images collected from distinctive digitization methods. Our machine learning approaches provide clinically actionable predictions that could inform treatments for colorectal cancer patients.
Spatially resolved transcriptomics involves a set of emerging technologies that enable the transcriptomic profiling of tissues with the physical location of expressions. Although a variety of methods ...have been developed for data integration, most of them are for single-cell RNA-seq datasets without consideration of spatial information. Thus, methods that can integrate spatial transcriptomics data from multiple tissue slides, possibly from multiple individuals, are needed. Here, we present PRECAST, a data integration method for multiple spatial transcriptomics datasets with complex batch effects and/or biological effects between slides. PRECAST unifies spatial factor analysis simultaneously with spatial clustering and embedding alignment, while requiring only partially shared cell/domain clusters across datasets. Using both simulated and four real datasets, we show improved cell/domain detection with outstanding visualization, and the estimated aligned embeddings and cell/domain labels facilitate many downstream analyses. We demonstrate that PRECAST is computationally scalable and applicable to spatial transcriptomics datasets from different platforms.
•Simulation-optimization (Sim-Opt) is a widely used, yet computationally expensive optimization technique.•In this paper, we propose a framework for developing a general purpose GPU program for ...Sim-Opt formulations.•We illustrate the framework using a key variable selection problem in process monitoring solved using a genetic algorithm.•Our results show that very significant acceleration in computation time can be obtained using the GPU.
Simulation-optimization (Sim-Opt) is a widely used optimization technique that enables the use of simulation model so as naturally describe system complexity and stochastics. A key barrier to its broader adoption is the high computational cost associated with simulation that often limits its practicability. In this paper, we propose the use of GPU parallel computing, to enhance the computational efficiency of Sim-Opt. The main objective of this work is to develop a systematic framework that can be used to construct an efficient hybrid CPU-GPU program. The optimization of a process monitoring model using a Genetic Algorithm is used as a case study to illustrate the proposed approach. Our results show an over 100× acceleration of computation time by the developed hybrid program in comparison to a traditional CPU-based approach.
Although abundant myeloid cell populations in the pancreatic ductal adenocarcinoma (PDAC) microenvironment have been postulated to suppress antitumor immunity, the composition of these populations, ...their spatial locations, and how they relate to patient outcomes are poorly understood.
To generate spatially resolved tumor and immune cell data at single-cell resolution, we developed two quantitative multiplex immunofluorescence assays to interrogate myeloid cells (CD15, CD14, ARG1, CD33, HLA-DR) and macrophages CD68, CD163, CD86, IFN regulatory factor 5, MRC1 (CD206) in the PDAC tumor microenvironment. Spatial point pattern analyses were conducted to assess the degree of colocalization between tumor cells and immune cells. Multivariable-adjusted Cox proportional hazards regression was used to assess associations with patient outcomes.
In a multi-institutional cohort of 305 primary PDAC resection specimens, myeloid cells were abundant, enriched within stromal regions, highly heterogeneous across tumors, and differed by somatic genotype. High densities of CD15
ARG1
immunosuppressive granulocytic cells and M2-polarized macrophages were associated with worse patient survival. Moreover, beyond cell density, closer proximity of M2-polarized macrophages to tumor cells was strongly associated with disease-free survival, revealing the clinical significance and biologic importance of immune cell localization within tumor areas.
A diverse set of myeloid cells are present within the PDAC tumor microenvironment and are distributed heterogeneously across patient tumors. Not only the densities but also the spatial locations of myeloid immune cells are associated with patient outcomes, highlighting the potential role of spatially resolved myeloid cell subtypes as quantitative biomarkers for PDAC prognosis and therapy.
The diversity of the naïve T cell repertoire drives the replenishment potential and capacity of memory T cells to respond to immune challenges. Attrition of the immune system is associated with an ...increased prevalence of pathologies in aged individuals, but whether stem cell memory T lymphocytes (T
) contribute to such attrition is still unclear. Using single cells RNA sequencing and high-dimensional flow cytometry, we demonstrate that T
heterogeneity results from differential engagement of Wnt signaling. In humans, aging is associated with the coupled loss of Wnt/β-catenin signature in CD4 T
and systemic increase in the levels of Dickkopf-related protein 1, a natural inhibitor of the Wnt/β-catenin pathway. Functional assays support recent thymic emigrants as the precursors of CD4 T
. Our data thus hint that reversing T
defects by metabolic targeting of the Wnt/β-catenin pathway may be a viable approach to restore and preserve immune homeostasis in the context of immunological history.
Macrophages are among the most common cells in the colorectal cancer microenvironment, but their prognostic significance is incompletely understood. Using multiplexed immunofluorescence for CD68, ...CD86, IRF5, MAF, MRC1 (CD206), and KRT (cytokeratins) combined with digital image analysis and machine learning, we assessed the polarization spectrum of tumor-associated macrophages in 931 colorectal carcinomas. We then applied Cox proportional hazards regression to assess prognostic survival associations of intraepithelial and stromal densities of M1-like and M2-like macrophages while controlling for potential confounders, including stage and microsatellite instability status. We found that high tumor stromal density of M2-like macrophages was associated with worse cancer-specific survival, whereas tumor stromal density of M1-like macrophages was not significantly associated with better cancer-specific survival. High M1:M2 density ratio in tumor stroma was associated with better cancer-specific survival. Overall macrophage densities in tumor intraepithelial or stromal regions were not prognostic. These findings suggested that macrophage polarization state, rather than their overall density, was associated with cancer-specific survival, with M1- and M2-like macrophage phenotypes exhibiting distinct prognostic roles. These results highlight the utility of a multimarker strategy to assess the macrophage polarization at single-cell resolution within the tumor microenvironment.
Although high T-cell density is a well-established favorable prognostic factor in colorectal cancer, the prognostic significance of tumor-associated plasma cells, neutrophils, and eosinophils is less ...well-defined.
We computationally processed digital images of hematoxylin and eosin (H&E)-stained sections to identify lymphocytes, plasma cells, neutrophils, and eosinophils in tumor intraepithelial and stromal areas of 934 colorectal cancers in two prospective cohort studies. Multivariable Cox proportional hazards regression was used to compute mortality HR according to cell density quartiles. The spatial patterns of immune cell infiltration were studied using the G
function, which estimates the likelihood of any tumor cell in a sample having at least one neighboring immune cell of the specified type within a certain radius. Validation studies were performed on an independent cohort of 570 colorectal cancers.
Immune cell densities measured by the automated classifier demonstrated high correlation with densities both from manual counts and those obtained from an independently trained automated classifier (Spearman's ρ 0.71-0.96). High densities of stromal lymphocytes and eosinophils were associated with better cancer-specific survival
< 0.001; multivariable HR (4th vs 1st quartile of eosinophils), 0.49; 95% confidence interval, 0.34-0.71. High G
area under the curve (AUC
;
= 0.002) and high G
AUC
(
< 0.001) also showed associations with better cancer-specific survival. High stromal eosinophil density was also associated with better cancer-specific survival in the validation cohort (
< 0.001).
These findings highlight the potential for machine learning assessment of H&E-stained sections to provide robust, quantitative tumor-immune biomarkers for precision medicine.
Background
Despite heightened interest in early-onset colorectal cancer (CRC) diagnosed before age 50, little is known on immune cell profiles of early-onset CRC. It also remains to be studied ...whether CRCs diagnosed at or shortly after age 50 are similar to early-onset CRC. We therefore hypothesized that immune cell infiltrates in CRC tissue might show differential heterogeneity patterns between three age groups (< 50 “early onset,” 50–54 “intermediate onset,” ≥ 55 “later onset”).
Methods
We examined 1,518 incident CRC cases with available tissue data, including 35 early-onset and 73 intermediate-onset cases. To identify immune cells in tumor intraepithelial and stromal areas, we developed three multiplexed immunofluorescence assays combined with digital image analyses and machine learning algorithms, with the following markers: (1) CD3, CD4, CD8, CD45RO (PTPRC), and FOXP3 for T cells; (2) CD68, CD86, IRF5, MAF, and MRC1 (CD206) for macrophages; and (3) ARG1, CD14, CD15, CD33, and HLA-DR for myeloid cells.
Results
Although no comparisons between age groups showed statistically significant differences at the stringent two-sided α level of 0.005, compared to later-onset CRC, early-onset CRC tended to show lower levels of tumor-infiltrating lymphocytes (
P
= 0.013), intratumoral periglandular reaction (
P
= 0.025), and peritumoral lymphocytic reaction (
P
= 0.044). Compared to later-onset CRC, intermediate-onset CRC tended to show lower densities of overall macrophages (
P
= 0.050), M1-like macrophages (
P
= 0.062), CD14
+
HLA-DR
+
cells (
P
= 0.015), and CD3
+
CD4
+
FOXP3
+
cells (
P
= 0.039).
Conclusions
This hypothesis-generating study suggests possible differences in histopathologic lymphocytic reaction patterns, macrophages, and regulatory T cells in the tumor microenvironment by age at diagnosis.