We consider a setting in which we have a treatment and a potentially large number of covariates for a set of observations, and wish to model their relationship with an outcome of interest. We propose ...a simple method for modeling interactions between the treatment and covariates. The idea is to modify the covariate in a simple way, and then fit a standard model using the modified covariates and no main effects. We show that coupled with an efficiency augmentation procedure, this method produces clinically meaningful estimators in a variety of settings. It can be useful for practicing personalized medicine: determining from a large set of biomarkers, the subset of patients that can potentially benefit from a treatment. We apply the method to both simulated datasets and real trial data. The modified covariates idea can be used for other purposes, for example, large scale hypothesis testing for determining which of a set of covariates interact with a treatment variable. Supplementary materials for this article are available online.
We introduce CIBERSORT, a method for characterizing cell composition of complex tissues from their gene expression profiles. When applied to enumeration of hematopoietic subsets in RNA mixtures from ...fresh, frozen and fixed tissues, including solid tumors, CIBERSORT outperformed other methods with respect to noise, unknown mixture content and closely related cell types. CIBERSORT should enable large-scale analysis of RNA mixtures for cellular biomarkers and therapeutic targets (http://cibersort.stanford.edu/).
Repbase is a reference database of eukaryotic repetitive DNA, which includes prototypic sequences of repeats and basic information described in annotations. Updating and maintenance of the database ...requires specialized tools, which we have created and made available for use with Repbase, and which may be useful as a template for other curated databases.
We describe the software tools RepbaseSubmitter and Censor, which are designed to facilitate updating and screening the content of Repbase. RepbaseSubmitter is a java-based interface for formatting and annotating Repbase entries. It eliminates many common formatting errors, and automates actions such as calculation of sequence lengths and composition, thus facilitating curation of Repbase sequences. In addition, it has several features for predicting protein coding regions in sequences; searching and including Pubmed references in Repbase entries; and searching the NCBI taxonomy database for correct inclusion of species information and taxonomic position. Censor is a tool to rapidly identify repetitive elements by comparison to known repeats. It uses WU-BLAST for speed and sensitivity, and can conduct DNA-DNA, DNA-protein, or translated DNA-translated DNA searches of genomic sequence. Defragmented output includes a map of repeats present in the query sequence, with the options to report masked query sequence(s), repeat sequences found in the query, and alignments.
Censor and RepbaseSubmitter are available as both web-based services and downloadable versions. They can be found at http://www.girinst.org/repbase/submission.html (RepbaseSubmitter) and http://www.girinst.org/censor/index.php (Censor).
The era of genomic medicine has allowed acute myeloid leukemia (AML) researchers to improve disease characterization, optimize risk-stratification systems, and develop new treatments. Although there ...has been significant progress, AML remains a lethal cancer because of its remarkably complex and plastic cellular architecture. This degree of heterogeneity continues to pose a major challenge, because it limits the ability to identify and therefore eradicate the cells responsible for leukemogenesis and treatment failure. In recent years, the field of single-cell genomics has led to unprecedented strides in the ability to characterize cellular heterogeneity, and it holds promise for the study of AML. In this review, we highlight advancements in single-cell technologies, outline important shortcomings in our understanding of AML biology and clinical management, and discuss how single-cell genomics can address these shortcomings as well as provide unique opportunities in basic and translational AML research.
Biological heterogeneity in diffuse large B cell lymphoma (DLBCL) is partly driven by cell-of-origin subtypes and associated genomic lesions, but also by diverse cell types and cell states in the ...tumor microenvironment (TME). However, dissecting these cell states and their clinical relevance at scale remains challenging. Here, we implemented EcoTyper, a machine-learning framework integrating transcriptome deconvolution and single-cell RNA sequencing, to characterize clinically relevant DLBCL cell states and ecosystems. Using this approach, we identified five cell states of malignant B cells that vary in prognostic associations and differentiation status. We also identified striking variation in cell states for 12 other lineages comprising the TME and forming cell state interactions in stereotyped ecosystems. While cell-of-origin subtypes have distinct TME composition, DLBCL ecosystems capture clinical heterogeneity within existing subtypes and extend beyond cell-of-origin and genotypic classes. These results resolve the DLBCL microenvironment at systems-level resolution and identify opportunities for therapeutic targeting (https://ecotyper.stanford.edu/lymphoma).
Display omitted
•Large-scale profiling of cell states & cellular ecosystems in hematologic malignancies•Atlas of malignant B cell states and 12 cell types in the DLBCL tumor microenvironment•Nine DLBCL cellular ecosystems & their relationships to molecular subtypes and survival•Candidate cellular biomarkers of response to bortezomib in DLBCL
Steen et al. implement EcoTyper, a machine-learning approach for dissecting cellular heterogeneity in the most common blood cancer, diffuse large B cell lymphoma (DLBCL). Forty-four cell states spanning malignant cells and the microenvironment are defined, uncovering a rich landscape of cellular ecosystems that extend beyond traditional DLBCL classifications, revealing new opportunities for therapy selection.
Leukemia stem cells (LSCs) are thought to share several properties with hematopoietic stem cells (HSCs), including cell-cycle quiescence and a capacity for self-renewal. These features are ...hypothesized to underlie leukemic initiation, progression, and relapse, and they also complicate efforts to eradicate leukemia through therapeutic targeting of LSCs without adverse effects on HSCs. Here, we show that acute myeloid leukemias (AMLs) with genomic rearrangements of the MLL gene contain a non-quiescent LSC population. Although human CD34+CD38− LSCs are generally highly quiescent, the C-type lectin CD93 is expressed on a subset of actively cycling, non-quiescent AML cells enriched for LSC activity. CD93 expression is functionally required for engraftment of primary human AML LSCs and leukemogenesis, and it regulates LSC self-renewal predominantly by silencing CDKN2B, a major tumor suppressor in AML. Thus, CD93 expression identifies a predominantly cycling, non-quiescent leukemia-initiating cell population in MLL-rearranged AML, providing opportunities for selective targeting and eradication of LSCs.
Display omitted
•Cell surface lectin CD93 is a functional marker of LSCs in MLL-rearranged AML•CD93+ LSCs are cycling, non-quiescent leukemia-initiating cells•LSC expression of CD93 is essential for MLL-mediated leukemogenesis•CD93 regulates LSC self-renewal by silencing CDKN2B in MLL leukemia
Iwasaki et al. demonstrate that leukemia stem cells (LSCs) in a distinctive genetic subtype of leukemia are non-quiescent. Although human LSCs are typically enriched in the highly quiescent CD34+CD38− phenotypic compartment, co-expression of the lectin CD93 further demarcates LSCs as a discrete subpopulation of actively cycling, non-quiescent AML cells.
Molecular profiles of tumors and tumor-associated cells hold great promise as biomarkers of clinical outcomes. However, existing data sets are fragmented and difficult to analyze systematically. Here ...we present a pan-cancer resource and meta-analysis of expression signatures from ∼18,000 human tumors with overall survival outcomes across 39 malignancies. By using this resource, we identified a forkhead box MI (FOXM1) regulatory network as a major predictor of adverse outcomes, and we found that expression of favorably prognostic genes, including KLRB1 (encoding CD161), largely reflect tumor-associated leukocytes. By applying CIBERSORT, a computational approach for inferring leukocyte representation in bulk tumor transcriptomes, we identified complex associations between 22 distinct leukocyte subsets and cancer survival. For example, tumor-associated neutrophil and plasma cell signatures emerged as significant but opposite predictors of survival for diverse solid tumors, including breast and lung adenocarcinomas. This resource and associated analytical tools (http://precog.stanford.edu) may help delineate prognostic genes and leukocyte subsets within and across cancers, shed light on the impact of tumor heterogeneity on cancer outcomes, and facilitate the discovery of biomarkers and therapeutic targets.
Acute myeloid leukaemia (AML) is characterized by subpopulations of leukaemia stem cells (LSCs) that are defined by their ability to engraft in immunodeficient mice. Here we show an LSC DNA ...methylation signature, derived from xenografts and integration with gene expression that is comprised of 71 genes and identifies a key role for the HOXA cluster. Most of the genes are epigenetically regulated independently of underlying mutations, although several are downstream targets of epigenetic modifier genes mutated in AML. The LSC epigenetic signature is associated with poor prognosis independent of known risk factors such as age and cytogenetics. Analysis of early haematopoietic progenitors from normal individuals reveals two distinct clusters of AML LSC resembling either lymphoid-primed multipotent progenitors or granulocyte/macrophage progenitors. These results provide evidence for DNA methylation variation between AML LSCs and their blast progeny, and identify epigenetically distinct subgroups of AML likely reflecting the cell of origin.
Determining how cells vary with their local signaling environment and organize into distinct cellular communities is critical for understanding processes as diverse as development, aging, and cancer. ...Here we introduce EcoTyper, a machine learning framework for large-scale identification and validation of cell states and multicellular communities from bulk, single-cell, and spatially resolved gene expression data. When applied to 12 major cell lineages across 16 types of human carcinoma, EcoTyper identified 69 transcriptionally defined cell states. Most states were specific to neoplastic tissue, ubiquitous across tumor types, and significantly prognostic. By analyzing cell-state co-occurrence patterns, we discovered ten clinically distinct multicellular communities with unexpectedly strong conservation, including three with myeloid and stromal elements linked to adverse survival, one enriched in normal tissue, and two associated with early cancer development. This study elucidates fundamental units of cellular organization in human carcinoma and provides a framework for large-scale profiling of cellular ecosystems in any tissue.
Display omitted
•EcoTyper enables large-scale profiling of cell states and multicellular ecosystems•Applicable to bulk, single-cell, and spatially resolved gene expression data•A reference atlas of 69 cell states and 10 ecosystems across 16 types of carcinoma•Carcinoma ecosystems have distinct biology, clinical outcomes, and spatial topology
EcoTyper, a machine learning framework for identifying and characterizing cell states and ecosystems from gene expression data, yields insights into the cellular landscape and community structure of human carcinoma, the leading cause of cancer-related mortality.
Single-cell RNA-sequencing has emerged as a powerful technique for characterizing cellular heterogeneity, but it is currently impractical on large sample cohorts and cannot be applied to fixed ...specimens collected as part of routine clinical care. We previously developed an approach for digital cytometry, called CIBERSORT, that enables estimation of cell type abundances from bulk tissue transcriptomes. We now introduce CIBERSORTx, a machine learning method that extends this framework to infer cell-type-specific gene expression profiles without physical cell isolation. By minimizing platform-specific variation, CIBERSORTx also allows the use of single-cell RNA-sequencing data for large-scale tissue dissection. We evaluated the utility of CIBERSORTx in multiple tumor types, including melanoma, where single-cell reference profiles were used to dissect bulk clinical specimens, revealing cell-type-specific phenotypic states linked to distinct driver mutations and response to immune checkpoint blockade. We anticipate that digital cytometry will augment single-cell profiling efforts, enabling cost-effective, high-throughput tissue characterization without the need for antibodies, disaggregation or viable cells.