The high-dimensional data created by high-throughput technologies require visualization tools that reveal data structure and patterns in an intuitive form. We present PHATE, a visualization method ...that captures both local and global nonlinear structure using an information-geometric distance between data points. We compare PHATE to other tools on a variety of artificial and biological datasets, and find that it consistently preserves a range of patterns in data, including continual progressions, branches and clusters, better than other tools. We define a manifold preservation metric, which we call denoised embedding manifold preservation (DEMaP), and show that PHATE produces lower-dimensional embeddings that are quantitatively better denoised as compared to existing visualization methods. An analysis of a newly generated single-cell RNA sequencing dataset on human germ-layer differentiation demonstrates how PHATE reveals unique biological insight into the main developmental branches, including identification of three previously undescribed subpopulations. We also show that PHATE is applicable to a wide variety of data types, including mass cytometry, single-cell RNA sequencing, Hi-C and gut microbiome data.
Single-cell RNA sequencing technologies suffer from many sources of technical noise, including under-sampling of mRNA molecules, often termed “dropout,” which can severely obscure important gene-gene ...relationships. To address this, we developed MAGIC (Markov affinity-based graph imputation of cells), a method that shares information across similar cells, via data diffusion, to denoise the cell count matrix and fill in missing transcripts. We validate MAGIC on several biological systems and find it effective at recovering gene-gene relationships and additional structures. Applied to the epithilial to mesenchymal transition, MAGIC reveals a phenotypic continuum, with the majority of cells residing in intermediate states that display stem-like signatures, and infers known and previously uncharacterized regulatory interactions, demonstrating that our approach can successfully uncover regulatory relations without perturbations.
Display omitted
•MAGIC restores noisy and sparse single-cell data using diffusion geometry•Corrected data are amenable to myriad downstream analyses•MAGIC enables archetypal analysis and inference of gene interactions•Transcription factor targets can be predicted without perturbation after MAGIC
A new algorithm overcomes limitations of data loss in single-cell sequencing experiments.
Geometry- and Accuracy-Preserving Random Forest Proximities Rhodes, Jake S.; Cutler, Adele; Moon, Kevin R.
IEEE transactions on pattern analysis and machine intelligence,
2023-Sept.-1, 2023-Sep, 2023-9-1, 20230901, Letnik:
45, Številka:
9
Journal Article
Recenzirano
Odprti dostop
Random forests are considered one of the best out-of-the-box classification and regression algorithms due to their high level of predictive performance with relatively little tuning. Pairwise ...proximities can be computed from a trained random forest and measure the similarity between data points relative to the supervised task. Random forest proximities have been used in many applications including the identification of variable importance, data imputation, outlier detection, and data visualization. However, existing definitions of random forest proximities do not accurately reflect the data geometry learned by the random forest. In this paper, we introduce a novel definition of random forest proximities called Random Forest-Geometry- and Accuracy-Preserving proximities (RF-GAP). We prove that the proximity-weighted sum (regression) or majority vote (classification) using RF-GAP exactly matches the out-of-bag random forest prediction, thus capturing the data geometry learned by the random forest. We empirically show that this improved geometric representation outperforms traditional random forest proximities in tasks such as data imputation and provides outlier detection and visualization results consistent with the learned data geometry.
It is currently challenging to analyze single-cell data consisting of many cells and samples, and to address variations arising from batch effects and different sample preparations. For this purpose, ...we present SAUCIE, a deep neural network that combines parallelization and scalability offered by neural networks, with the deep representation of data that can be learned by them to perform many single-cell data analysis tasks. Our regularizations (penalties) render features learned in hidden layers of the neural network interpretable. On large, multi-patient datasets, SAUCIE's various hidden layers contain denoised and batch-corrected data, a low-dimensional visualization and unsupervised clustering, as well as other information that can be used to explore the data. We analyze a 180-sample dataset consisting of 11 million T cells from dengue patients in India, measured with mass cytometry. SAUCIE can batch correct and identify cluster-based signatures of acute dengue infection and create a patient manifold, stratifying immune response to dengue.
Geometry Regularized Autoencoders Duque, Andres F.; Morin, Sacha; Wolf, Guy ...
IEEE transactions on pattern analysis and machine intelligence,
2023-June-1, 2023-Jun, 2023-6-1, 20230601, Letnik:
45, Številka:
6
Journal Article
Recenzirano
A fundamental task in data exploration is to extract low dimensional representations that capture intrinsic geometry in data, especially for faithfully visualizing data in two or three dimensions. ...Common approaches use kernel methods for manifold learning. However, these methods typically only provide an embedding of the input data and cannot extend naturally to new data points. Autoencoders have also become popular for representation learning. While they naturally compute feature extractors that are extendable to new data and invertible (i.e., reconstructing original features from latent representation), they often fail at representing the intrinsic data geometry compared to kernel-based manifold learning. We present a new method for integrating both approaches by incorporating a geometric regularization term in the bottleneck of the autoencoder. This regularization encourages the learned latent representation to follow the intrinsic data geometry, similar to manifold learning algorithms, while still enabling faithful extension to new data and preserving invertibility. We compare our approach to autoencoder models for manifold learning to provide qualitative and quantitative evidence of our advantages in preserving intrinsic structure, out of sample extension, and reconstruction. Our method is easily implemented for big-data applications, whereas other methods are limited in this regard.
Neuropil is a fundamental form of tissue organization within the brain
, in which densely packed neurons synaptically interconnect into precise circuit architecture
. However, the structural and ...developmental principles that govern this nanoscale precision remain largely unknown
. Here we use an iterative data coarse-graining algorithm termed 'diffusion condensation'
to identify nested circuit structures within the Caenorhabditis elegans neuropil, which is known as the nerve ring. We show that the nerve ring neuropil is largely organized into four strata that are composed of related behavioural circuits. The stratified architecture of the neuropil is a geometrical representation of the functional segregation of sensory information and motor outputs, with specific sensory organs and muscle quadrants mapping onto particular neuropil strata. We identify groups of neurons with unique morphologies that integrate information across strata and that create neural structures that cage the strata within the nerve ring. We use high resolution light-sheet microscopy
coupled with lineage-tracing and cell-tracking algorithms
to resolve the developmental sequence and reveal principles of cell position, migration and outgrowth that guide stratified neuropil organization. Our results uncover conserved structural design principles that underlie the architecture and function of the nerve ring neuropil, and reveal a temporal progression of outgrowth-based on pioneer neurons-that guides the hierarchical development of the layered neuropil. Our findings provide a systematic blueprint for using structural and developmental approaches to understand neuropil organization within the brain.
Meteorin‐like (metrnl) is a recently identified adipomyokine that beneficially affects glucose metabolism; however, its underlying mechanism of action is not completely understood. We here show that ...the level of metrnl increases in vitro under electrical pulse stimulation and in vivo in exercised mice, suggesting that metrnl is secreted during muscle contractions. In addition, metrnl increases glucose uptake via the calcium‐dependent AMPKα2 pathway in skeletal muscle cells and increases the phosphorylation of HDAC5, a transcriptional repressor of GLUT4, in an AMPKα2‐dependent manner. Phosphorylated HDAC5 interacts with 14‐3‐3 proteins and sequesters them in the cytoplasm, resulting in the activation of GLUT4 transcription. An intraperitoneal injection of recombinant metrnl improved glucose tolerance in mice with high‐fat‐diet‐induced obesity or type 2 diabetes, but not in AMPK β1β2 muscle‐specific null mice. Metrnl improves glucose metabolism via AMPKα2 and is a promising therapeutic candidate for glucose‐related diseases such as type 2 diabetes.
We found that metrnl, known as an adipomyokine, is secreted during muscle contractions. Metrnl increases glucose uptake via AMPKα in skeletal muscle cells and increases the phosphorylation of HDAC5 and TBC1D1 in AMPKα‐dependent manner. Recombinant metrnl improves glucose tolerance in mice with obesity or type 2 diabetes. These results suggest that metrnl is a promising therapeutic candidate for diabetes.
We conducted an open label, dose escalation Phase 1 clinical trial of a tetravalent dengue DNA vaccine (TVDV) formulated in Vaxfectin
to assess safety and immunogenicity. A total of 40 dengue- and ...flavivirus-naive volunteers received either low-dose (1 mg) TVDV alone (
= 10, group 1), low-dose TVDV (1 mg) formulated in Vaxfectin (
= 10, group 2), or high-dose TVDV (2 mg, group 3) formulated in Vaxfectin
(
= 20). Subjects were immunized intramuscularly with three doses on a 0-, 30-, 90-day schedule and monitored. Blood samples were obtained after each immunization and various time points thereafter to assess anti-dengue antibody and interferon gamma (IFNγ) T-cell immune responses. The most common adverse events (AEs) across all groups included mild to moderate pain and tenderness at the injection site, which typically resolved within 7 days. Common solicited signs and symptoms included fatigue (42.5%), headache (45%), and myalgias (47.5%). There were no serious AEs related to the vaccine or study procedures. No anti-dengue antibody responses were detected in group 1 subjects who received all three immunizations. There were minimal enzyme-linked immunosorbent assay and neutralizing antibody responses among groups 2 and 3 subjects who completed the immunization schedule. By contrast, IFNγ T-cell responses, regardless of serotype specificity, occurred in 70%, 50%, and 79% of subjects in groups 1, 2, and 3, respectively. The largest IFNγ T-cell responses were among group 3 subjects. We conclude that TVDV was safe and well-tolerated and elicited predominately anti-dengue T-cell IFNγ responses in a dose-related fashion.
Ensemble Estimation of Information Divergence Moon, Kevin R; Sricharan, Kumar; Greenewald, Kristjan ...
Entropy (Basel, Switzerland),
07/2018, Letnik:
20, Številka:
8
Journal Article
Recenzirano
Odprti dostop
Recent work has focused on the problem of nonparametric estimation of information divergence functionals between two continuous random variables. Many existing approaches require either restrictive ...assumptions about the density support set or difficult calculations at the support set boundary which must be known a priori. The mean squared error (MSE) convergence rate of a leave-one-out kernel density plug-in divergence functional estimator for general bounded density support sets is derived where knowledge of the support boundary, and therefore, the boundary correction is not required. The theory of optimally weighted ensemble estimation is generalized to derive a divergence estimator that achieves the parametric rate when the densities are sufficiently smooth. Guidelines for the tuning parameter selection and the asymptotic distribution of this estimator are provided. Based on the theory, an empirical estimator of Rényi-α divergence is proposed that greatly outperforms the standard kernel density plug-in estimator in terms of mean squared error, especially in high dimensions. The estimator is shown to be robust to the choice of tuning parameters. We show extensive simulation results that verify the theoretical results of our paper. Finally, we apply the proposed estimator to estimate the bounds on the Bayes error rate of a cell classification problem.
Diesel exhaust particles (DEPs) are major constituents of air pollution and associated with numerous oxidative stress-induced human diseases. In vitro toxicity studies are useful for developing a ...better understanding of species-specific in vivo conditions. Conventional in vitro assessments based on oxidative biomarkers are destructive and inefficient. In this study, Raman spectroscopy, as a non-invasive imaging tool, was used to capture the molecular fingerprints of overall cellular component responses (nucleic acid, lipids, proteins, carbohydrates) to DEP damage and antioxidant protection. We apply a novel data visualization algorithm called PHATE, which preserves both global and local structure, to display the progression of cell damage over DEP exposure time. Meanwhile, a mutual information (MI) estimator was used to identify the most informative Raman peaks associated with cytotoxicity. A health index was defined to quantitatively assess the protective effects of two antioxidants (resveratrol and mesobiliverdin IXα) against DEP induced cytotoxicity. In addition, a number of machine learning classifiers were applied to successfully discriminate different treatment groups with high accuracy. Correlations between Raman spectra and immunomodulatory cytokine and chemokine levels were evaluated. In conclusion, the combination of label-free, non-disruptive Raman micro-spectroscopy and machine learning analysis is demonstrated as a useful tool in quantitative analysis of oxidative stress induced cytotoxicity and for effectively assessing various antioxidant treatments, suggesting that this framework can serve as a high throughput platform for screening various potential antioxidants based on their effectiveness at battling the effects of air pollution on human health.
Display omitted
•Apply new algorithms (PHATE and MI) to visualize the Raman spectral data.•Raman spectroscopy was utilized to monitor cellular responses to oxidative stress.•The health index was proposed to quantitatively assess antioxidants protection.•A number of machine learning algorithms were applied to analyze Raman spectral data.•Correlation between Raman spectra and cytokine level was analyzed.