Current methods for comparing single-cell RNA sequencing datasets collected in multiple conditions focus on discrete regions of the transcriptional state space, such as clusters of cells. Here we ...quantify the effects of perturbations at the single-cell level using a continuous measure of the effect of a perturbation across the transcriptomic space. We describe this space as a manifold and develop a relative likelihood estimate of observing each cell in each of the experimental conditions using graph signal processing. This likelihood estimate can be used to identify cell populations specifically affected by a perturbation. We also develop vertex frequency clustering to extract populations of affected cells at the level of granularity that matches the perturbation response. The accuracy of our algorithm at identifying clusters of cells that are enriched or depleted in each condition is, on average, 57% higher than the next-best-performing algorithm tested. Gene signatures derived from these clusters are more accurate than those of six alternative algorithms in ground truth comparisons.
Single-cell RNA sequencing technologies suffer from many sources of technical noise, including under-sampling of mRNA molecules, often termed “dropout,” which can severely obscure important gene-gene ...relationships. To address this, we developed MAGIC (Markov affinity-based graph imputation of cells), a method that shares information across similar cells, via data diffusion, to denoise the cell count matrix and fill in missing transcripts. We validate MAGIC on several biological systems and find it effective at recovering gene-gene relationships and additional structures. Applied to the epithilial to mesenchymal transition, MAGIC reveals a phenotypic continuum, with the majority of cells residing in intermediate states that display stem-like signatures, and infers known and previously uncharacterized regulatory interactions, demonstrating that our approach can successfully uncover regulatory relations without perturbations.
Display omitted
•MAGIC restores noisy and sparse single-cell data using diffusion geometry•Corrected data are amenable to myriad downstream analyses•MAGIC enables archetypal analysis and inference of gene interactions•Transcription factor targets can be predicted without perturbation after MAGIC
A new algorithm overcomes limitations of data loss in single-cell sequencing experiments.
The high-dimensional data created by high-throughput technologies require visualization tools that reveal data structure and patterns in an intuitive form. We present PHATE, a visualization method ...that captures both local and global nonlinear structure using an information-geometric distance between data points. We compare PHATE to other tools on a variety of artificial and biological datasets, and find that it consistently preserves a range of patterns in data, including continual progressions, branches and clusters, better than other tools. We define a manifold preservation metric, which we call denoised embedding manifold preservation (DEMaP), and show that PHATE produces lower-dimensional embeddings that are quantitatively better denoised as compared to existing visualization methods. An analysis of a newly generated single-cell RNA sequencing dataset on human germ-layer differentiation demonstrates how PHATE reveals unique biological insight into the main developmental branches, including identification of three previously undescribed subpopulations. We also show that PHATE is applicable to a wide variety of data types, including mass cytometry, single-cell RNA sequencing, Hi-C and gut microbiome data.
Cancer is a hyper-proliferative disease. Whether the proliferative state originates from the cell-of-origin or emerges later remains difficult to resolve. By tracking de novo transformation from ...normal hematopoietic progenitors expressing an acute myeloid leukemia (AML) oncogene MLL-AF9, we reveal that the cell cycle rate heterogeneity among granulocyte-macrophage progenitors (GMPs) determines their probability of transformation. A fast cell cycle intrinsic to these progenitors provide permissiveness for transformation, with the fastest cycling 3% GMPs acquiring malignancy with near certainty. Molecularly, we propose that MLL-AF9 preserves gene expression of the cellular states in which it is expressed. As such, when expressed in the naturally-existing, rapidly-cycling immature myeloid progenitors, this cell state becomes perpetuated, yielding malignancy. In humans, high CCND1 expression predicts worse prognosis for MLL fusion AMLs. Our work elucidates one of the earliest steps toward malignancy and suggests that modifying the cycling state of the cell-of-origin could be a preventative approach against malignancy.
Anomaly detection is of great interest in fields where abnormalities need to be identified and corrected (e.g., medicine and finance). Deep learning methods for this task often rely on autoencoder ...reconstruction error, sometimes in conjunction with other penalties. We show that this approach exhibits intrinsic biases that lead to undesirable results. Reconstruction-based methods can sometimes show low error on simple-to-reconstruct points that are not part of the training data, for example the all black image. Instead, we introduce a new unsupervised
Lipschitz anomaly discriminator
(LAD) that does not suffer from these biases. Our anomaly discriminator is trained, similar to the discriminator of a GAN, to detect the difference between the training data and corruptions of the training data. We show that this procedure successfully detects unseen anomalies with guarantees on those that have a certain Wasserstein distance from the data or corrupted training set. These additions allow us to show improved performance on MNIST, CIFAR10, and health record data. Further, LAD does not require decoding back to the original data space, which makes anomaly detection possible in domains where it is difficult to define a decoder, such as in irregular graph structured data. Empirically, we show this framework leads to improved performance on image, health record, and graph data.
New high-dimensional, single-cell technologies offer unprecedented resolution in the analysis of heterogeneous tissues. However, because these technologies can measure dozens of parameters ...simultaneously in individual cells, data interpretation can be challenging. Here we present viSNE, a tool that allows one to map high-dimensional cytometry data onto two dimensions, yet conserve the high-dimensional structure of the data. viSNE plots individual cells in a visual similar to a scatter plot, while using all pairwise distances in high dimension to determine each cell's location in the plot. We integrated mass cytometry with viSNE to map healthy and cancerous bone marrow samples. Healthy bone marrow automatically maps into a consistent shape, whereas leukemia samples map into malformed shapes that are distinct from healthy bone marrow and from each other. We also use viSNE and mass cytometry to compare leukemia diagnosis and relapse samples, and to identify a rare leukemia population reminiscent of minimal residual disease. viSNE can be applied to any multi-dimensional single-cell technology.
The evolution of uniquely human traits likely entailed changes in developmental gene regulation. Human Accelerated Regions (HARs), which include transcriptional enhancers harboring a significant ...excess of human-specific sequence changes, are leading candidates for driving gene regulatory modifications in human development. However, insight into whether HARs alter the level, distribution, and timing of endogenous gene expression remains limited. We examined the role of the HAR HACNS1 (HAR2) in human evolution by interrogating its molecular functions in a genetically humanized mouse model. We find that HACNS1 maintains its human-specific enhancer activity in the mouse embryo and modifies expression of Gbx2, which encodes a transcription factor, during limb development. Using single-cell RNA-sequencing, we demonstrate that Gbx2 is upregulated in the limb chondrogenic mesenchyme of HACNS1 homozygous embryos, supporting that HACNS1 alters gene expression in cell types involved in skeletal patterning. Our findings illustrate that humanized mouse models provide mechanistic insight into how HARs modified gene expression in human evolution.
While several tools have been developed to map axes of variation among individual cells, no analogous approaches exist for identifying axes of variation among multicellular biospecimens profiled at ...single-cell resolution. For this purpose, we developed 'phenotypic earth mover's distance' (PhEMD). PhEMD is a general method for embedding a 'manifold of manifolds', in which each datapoint in the higher-level manifold (of biospecimens) represents a collection of points that span a lower-level manifold (of cells). We apply PhEMD to a newly generated drug-screen dataset and demonstrate that PhEMD uncovers axes of cell subpopulational variation among a large set of perturbation conditions. Moreover, we show that PhEMD can be used to infer the phenotypes of biospecimens not directly profiled. Applied to clinical datasets, PhEMD generates a map of the patient-state space that highlights sources of patient-to-patient variation. PhEMD is scalable, compatible with leading batch-effect correction techniques and generalizable to multiple experimental designs.
Acute gastrointestinal bleeding is the most common gastrointestinal cause for hospitalization. For high-risk patients requiring intensive care unit stay, predicting transfusion needs during the first ...24 h using dynamic risk assessment may improve resuscitation with red blood cell transfusion in admitted patients with severe acute gastrointestinal bleeding. A patient cohort admitted for acute gastrointestinal bleeding (N = 2,524) was identified from the Medical Information Mart for Intensive Care III (MIMIC-III) critical care database and separated into training (N = 2,032) and internal validation (N = 492) sets. The external validation patient cohort was identified from the eICU collaborative database of patients admitted for acute gastrointestinal bleeding presenting to large urban hospitals (N = 1,526). 62 demographic, clinical, and laboratory test features were consolidated into 4-h time intervals over the first 24 h from admission. The outcome measure was the transfusion of red blood cells during each 4-h time interval. A long short-term memory (LSTM) model, a type of Recurrent Neural Network, was compared to a regression-based models on time-updated data. The LSTM model performed better than discrete time regression-based models for both internal validation (AUROC 0.81 vs 0.75 vs 0.75; P < 0.001) and external validation (AUROC 0.65 vs 0.56 vs 0.56; P < 0.001). A LSTM model can be used to predict the need for transfusion of packed red blood cells over the first 24 h from admission to help personalize the care of high-risk patients with acute gastrointestinal bleeding.
Objective
High‐expression alleles of macrophage migration inhibitory factor (MIF) are linked genetically to the severity of systemic lupus erythematosus (SLE). The U1 small nuclear RNP (snRNP) immune ...complex containing U1 snRNP and anti–U1 snRNP antibodies, which are found in patients with SLE, activates the NLRP3 inflammasome, comprising NLRP3, ASC, and procaspase 1, in human monocytes, leading to the production of interleukin‐1β (IL‐1β). This study was undertaken to investigate the role of the snRNP immune complex in up‐regulating the expression of MIF and its interface with the NLRP3 inflammasome.
Methods
MIF, IL‐1β, NLRP3, caspase 1, ASC, and MIF receptors were analyzed by enzyme‐linked immunosorbent assay, Western blotting, quantitative polymerase chain reaction, and cytometry by time‐of‐flight mass spectrometry (CytoF) in human monocytes incubated with or without the snRNP immune complex. MIF pathway responses were probed with the novel small molecule antagonist MIF098.
Results
The snRNP immune complex induced the production of MIF and IL‐1β from human monocytes. High‐dimensional, single‐cell CytoF analysis established that MIF regulates activation of the NLRP3 inflammasome, including findings of a quantitative relationship between MIF and its receptors and IL‐1β levels in the monocytes. MIF098, which blocks MIF binding to its cognate receptor, suppressed the production of IL‐1β, the up‐regulation of NLRP3, which is a rate‐limiting step in NLRP3 inflammasome activation, and the activation of caspase 1 in snRNP immune complex–stimulated human monocytes.
Conclusion
The U1 snRNP immune complex is a specific stimulus of MIF production in human monocytes, with MIF having an upstream role in defining the inflammatory characteristics of activated monocytes by regulating NLRP3 inflammasome activation and downstream IL‐1β production. These findings provide mechanistic insight and a therapeutic rationale for targeting MIF in subgroups of lupus patients, such as those classified as high genotypic MIF expressers or those with anti‐snRNP antibodies.