In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful ...patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL.
Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section.
The Cytoscape plugin clusterMaker provides a number of clustering algorithms and visualizations that can be used independently or in combination for analysis and visualization of biological data sets, and for confirming or generating hypotheses about biological function. Several of these visualizations and algorithms are only available to Cytoscape users through the clusterMaker plugin. clusterMaker is available via the Cytoscape plugin manager.
We introduce CIBERSORT, a method for characterizing cell composition of complex tissues from their gene expression profiles. When applied to enumeration of hematopoietic subsets in RNA mixtures from ...fresh, frozen and fixed tissues, including solid tumors, CIBERSORT outperformed other methods with respect to noise, unknown mixture content and closely related cell types. CIBERSORT should enable large-scale analysis of RNA mixtures for cellular biomarkers and therapeutic targets (http://cibersort.stanford.edu/).
Tumor infiltrating leukocytes (TILs) are an integral component of the tumor microenvironment and have been found to correlate with prognosis and response to therapy. Methods to enumerate immune ...subsets such as immunohistochemistry or flow cytometry suffer from limitations in phenotypic markers and can be challenging to practically implement and standardize. An alternative approach is to acquire aggregative high dimensional data from cellular mixtures and to subsequently infer the cellular components computationally. We recently described CIBERSORT, a versatile computational method for quantifying cell fractions from bulk tissue gene expression profiles (GEPs). Combining support vector regression with prior knowledge of expression profiles from purified leukocyte subsets, CIBERSORT can accurately estimate the immune composition of a tumor biopsy. In this chapter, we provide a primer on the CIBERSORT method and illustrate its use for characterizing TILs in tumor samples profiled by microarray or RNA-Seq.
Single-cell RNA sequencing (scRNA-seq) is a powerful approach for reconstructing cellular differentiation trajectories. However, inferring both the state and direction of differentiation is ...challenging. Here, we demonstrate a simple, yet robust, determinant of developmental potential-the number of expressed genes per cell-and leverage this measure of transcriptional diversity to develop a computational framework (CytoTRACE) for predicting differentiation states from scRNA-seq data. When applied to diverse tissue types and organisms, CytoTRACE outperformed previous methods and nearly 19,000 annotated gene sets for resolving 52 experimentally determined developmental trajectories. Additionally, it facilitated the identification of quiescent stem cells and revealed genes that contribute to breast tumorigenesis. This study thus establishes a key RNA-based feature of developmental potential and a platform for delineation of cellular hierarchies.
Combination checkpoint blockade (CCB) targeting inhibitory CTLA4 and PD1 receptors holds promise for cancer therapy. Immune-related adverse events (IRAEs) remain a major obstacle for the optimal ...application of CCB in cancer. Here, we analyzed B cell changes in patients with melanoma following treatment with either anti-CTLA4 or anti-PD1, or in combination. CCB therapy led to changes in circulating B cells that were detectable after the first cycle of therapy and characterized by a decline in circulating B cells and an increase in CD21lo B cells and plasmablasts. PD1 expression was higher in the CD21lo B cells, and B cell receptor sequencing of these cells demonstrated greater clonality and a higher frequency of clones compared with CD21hi cells. CCB induced proliferation in the CD21lo compartment, and single-cell RNA sequencing identified B cell activation in cells with genomic profiles of CD21lo B cells in vivo. Increased clonality of circulating B cells following CCB occurred in some patients. Treatment-induced changes in B cells preceded and correlated with both the frequency and timing of IRAEs. Patients with early B cell changes experienced higher rates of grade 3 or higher IRAEs 6 months after CCB. Thus, early changes in B cells following CCB may identify patients who are at increased risk of IRAEs, and preemptive strategies targeting B cells may reduce toxicities in these patients.
Biological heterogeneity in diffuse large B cell lymphoma (DLBCL) is partly driven by cell-of-origin subtypes and associated genomic lesions, but also by diverse cell types and cell states in the ...tumor microenvironment (TME). However, dissecting these cell states and their clinical relevance at scale remains challenging. Here, we implemented EcoTyper, a machine-learning framework integrating transcriptome deconvolution and single-cell RNA sequencing, to characterize clinically relevant DLBCL cell states and ecosystems. Using this approach, we identified five cell states of malignant B cells that vary in prognostic associations and differentiation status. We also identified striking variation in cell states for 12 other lineages comprising the TME and forming cell state interactions in stereotyped ecosystems. While cell-of-origin subtypes have distinct TME composition, DLBCL ecosystems capture clinical heterogeneity within existing subtypes and extend beyond cell-of-origin and genotypic classes. These results resolve the DLBCL microenvironment at systems-level resolution and identify opportunities for therapeutic targeting (https://ecotyper.stanford.edu/lymphoma).
Display omitted
•Large-scale profiling of cell states & cellular ecosystems in hematologic malignancies•Atlas of malignant B cell states and 12 cell types in the DLBCL tumor microenvironment•Nine DLBCL cellular ecosystems & their relationships to molecular subtypes and survival•Candidate cellular biomarkers of response to bortezomib in DLBCL
Steen et al. implement EcoTyper, a machine-learning approach for dissecting cellular heterogeneity in the most common blood cancer, diffuse large B cell lymphoma (DLBCL). Forty-four cell states spanning malignant cells and the microenvironment are defined, uncovering a rich landscape of cellular ecosystems that extend beyond traditional DLBCL classifications, revealing new opportunities for therapy selection.
Molecular profiles of tumors and tumor-associated cells hold great promise as biomarkers of clinical outcomes. However, existing data sets are fragmented and difficult to analyze systematically. Here ...we present a pan-cancer resource and meta-analysis of expression signatures from ∼18,000 human tumors with overall survival outcomes across 39 malignancies. By using this resource, we identified a forkhead box MI (FOXM1) regulatory network as a major predictor of adverse outcomes, and we found that expression of favorably prognostic genes, including KLRB1 (encoding CD161), largely reflect tumor-associated leukocytes. By applying CIBERSORT, a computational approach for inferring leukocyte representation in bulk tumor transcriptomes, we identified complex associations between 22 distinct leukocyte subsets and cancer survival. For example, tumor-associated neutrophil and plasma cell signatures emerged as significant but opposite predictors of survival for diverse solid tumors, including breast and lung adenocarcinomas. This resource and associated analytical tools (http://precog.stanford.edu) may help delineate prognostic genes and leukocyte subsets within and across cancers, shed light on the impact of tumor heterogeneity on cancer outcomes, and facilitate the discovery of biomarkers and therapeutic targets.
Outcomes for patients with diffuse large B-cell lymphoma remain heterogeneous, with existing methods failing to consistently predict treatment failure. We examined the additional prognostic value of ...circulating tumor DNA (ctDNA) before and during therapy for predicting patient outcomes.
We studied the dynamics of ctDNA from 217 patients treated at six centers, using a training and validation framework. We densely characterized early ctDNA dynamics during therapy using cancer personalized profiling by deep sequencing to define response-associated thresholds within a discovery set. These thresholds were assessed in two independent validation sets. Finally, we assessed the prognostic value of ctDNA in the context of established risk factors, including the International Prognostic Index and interim positron emission tomography/computed tomography scans.
Before therapy, ctDNA was detectable in 98% of patients; pretreatment levels were prognostic in both front-line and salvage settings. In the discovery set, ctDNA levels changed rapidly, with a 2-log decrease after one cycle (early molecular response EMR) and a 2.5-log decrease after two cycles (major molecular response MMR) stratifying outcomes. In the first validation set, patients receiving front-line therapy achieving EMR or MMR had superior outcomes at 24 months (EMR: EFS, 83% v 50%; P = .0015; MMR: EFS, 82% v 46%; P < .001). EMR also predicted superior 24-month outcomes in patients receiving salvage therapy in the first validation set (EFS, 100% v 13%; P = .011). The prognostic value of EMR and MMR was further confirmed in the second validation set. In multivariable analyses including International Prognostic Index and interim positron emission tomography/computed tomography scans across both cohorts, molecular response was independently prognostic of outcomes, including event-free and overall survival.
Pretreatment ctDNA levels and molecular responses are independently prognostic of outcomes in aggressive lymphomas. These risk factors could potentially guide future personalized risk-directed approaches.
CIBERSORTx is a suite of machine learning tools for the assessment of cellular abundance and cell type-specific gene expression patterns from bulk tissue transcriptome profiles. With this framework, ...single-cell or bulk-sorted RNA sequencing data can be used to learn molecular signatures of distinct cell types from a small collection of biospecimens. These signatures can then be repeatedly applied to characterize cellular heterogeneity from bulk tissue transcriptomes without physical cell isolation. In this chapter, we provide a detailed primer on CIBERSORTx and demonstrate its capabilities for high-throughput profiling of cell types and cellular states in normal and neoplastic tissues.
Determining how cells vary with their local signaling environment and organize into distinct cellular communities is critical for understanding processes as diverse as development, aging, and cancer. ...Here we introduce EcoTyper, a machine learning framework for large-scale identification and validation of cell states and multicellular communities from bulk, single-cell, and spatially resolved gene expression data. When applied to 12 major cell lineages across 16 types of human carcinoma, EcoTyper identified 69 transcriptionally defined cell states. Most states were specific to neoplastic tissue, ubiquitous across tumor types, and significantly prognostic. By analyzing cell-state co-occurrence patterns, we discovered ten clinically distinct multicellular communities with unexpectedly strong conservation, including three with myeloid and stromal elements linked to adverse survival, one enriched in normal tissue, and two associated with early cancer development. This study elucidates fundamental units of cellular organization in human carcinoma and provides a framework for large-scale profiling of cellular ecosystems in any tissue.
Display omitted
•EcoTyper enables large-scale profiling of cell states and multicellular ecosystems•Applicable to bulk, single-cell, and spatially resolved gene expression data•A reference atlas of 69 cell states and 10 ecosystems across 16 types of carcinoma•Carcinoma ecosystems have distinct biology, clinical outcomes, and spatial topology
EcoTyper, a machine learning framework for identifying and characterizing cell states and ecosystems from gene expression data, yields insights into the cellular landscape and community structure of human carcinoma, the leading cause of cancer-related mortality.