Single-cell RNA sequencing technologies suffer from many sources of technical noise, including under-sampling of mRNA molecules, often termed “dropout,” which can severely obscure important gene-gene ...relationships. To address this, we developed MAGIC (Markov affinity-based graph imputation of cells), a method that shares information across similar cells, via data diffusion, to denoise the cell count matrix and fill in missing transcripts. We validate MAGIC on several biological systems and find it effective at recovering gene-gene relationships and additional structures. Applied to the epithilial to mesenchymal transition, MAGIC reveals a phenotypic continuum, with the majority of cells residing in intermediate states that display stem-like signatures, and infers known and previously uncharacterized regulatory interactions, demonstrating that our approach can successfully uncover regulatory relations without perturbations.
Display omitted
•MAGIC restores noisy and sparse single-cell data using diffusion geometry•Corrected data are amenable to myriad downstream analyses•MAGIC enables archetypal analysis and inference of gene interactions•Transcription factor targets can be predicted without perturbation after MAGIC
A new algorithm overcomes limitations of data loss in single-cell sequencing experiments.
The high-dimensional data created by high-throughput technologies require visualization tools that reveal data structure and patterns in an intuitive form. We present PHATE, a visualization method ...that captures both local and global nonlinear structure using an information-geometric distance between data points. We compare PHATE to other tools on a variety of artificial and biological datasets, and find that it consistently preserves a range of patterns in data, including continual progressions, branches and clusters, better than other tools. We define a manifold preservation metric, which we call denoised embedding manifold preservation (DEMaP), and show that PHATE produces lower-dimensional embeddings that are quantitatively better denoised as compared to existing visualization methods. An analysis of a newly generated single-cell RNA sequencing dataset on human germ-layer differentiation demonstrates how PHATE reveals unique biological insight into the main developmental branches, including identification of three previously undescribed subpopulations. We also show that PHATE is applicable to a wide variety of data types, including mass cytometry, single-cell RNA sequencing, Hi-C and gut microbiome data.
Geometry- and Accuracy-Preserving Random Forest Proximities Rhodes, Jake S.; Cutler, Adele; Moon, Kevin R.
IEEE transactions on pattern analysis and machine intelligence,
2023-Sept.-1, 2023-Sep, 2023-9-1, 20230901, Letnik:
45, Številka:
9
Journal Article
Recenzirano
Odprti dostop
Random forests are considered one of the best out-of-the-box classification and regression algorithms due to their high level of predictive performance with relatively little tuning. Pairwise ...proximities can be computed from a trained random forest and measure the similarity between data points relative to the supervised task. Random forest proximities have been used in many applications including the identification of variable importance, data imputation, outlier detection, and data visualization. However, existing definitions of random forest proximities do not accurately reflect the data geometry learned by the random forest. In this paper, we introduce a novel definition of random forest proximities called Random Forest-Geometry- and Accuracy-Preserving proximities (RF-GAP). We prove that the proximity-weighted sum (regression) or majority vote (classification) using RF-GAP exactly matches the out-of-bag random forest prediction, thus capturing the data geometry learned by the random forest. We empirically show that this improved geometric representation outperforms traditional random forest proximities in tasks such as data imputation and provides outlier detection and visualization results consistent with the learned data geometry.
It is currently challenging to analyze single-cell data consisting of many cells and samples, and to address variations arising from batch effects and different sample preparations. For this purpose, ...we present SAUCIE, a deep neural network that combines parallelization and scalability offered by neural networks, with the deep representation of data that can be learned by them to perform many single-cell data analysis tasks. Our regularizations (penalties) render features learned in hidden layers of the neural network interpretable. On large, multi-patient datasets, SAUCIE's various hidden layers contain denoised and batch-corrected data, a low-dimensional visualization and unsupervised clustering, as well as other information that can be used to explore the data. We analyze a 180-sample dataset consisting of 11 million T cells from dengue patients in India, measured with mass cytometry. SAUCIE can batch correct and identify cluster-based signatures of acute dengue infection and create a patient manifold, stratifying immune response to dengue.
Geometry Regularized Autoencoders Duque, Andres F.; Morin, Sacha; Wolf, Guy ...
IEEE transactions on pattern analysis and machine intelligence,
2023-June-1, 2023-Jun, 2023-6-1, 20230601, Letnik:
45, Številka:
6
Journal Article
Recenzirano
A fundamental task in data exploration is to extract low dimensional representations that capture intrinsic geometry in data, especially for faithfully visualizing data in two or three dimensions. ...Common approaches use kernel methods for manifold learning. However, these methods typically only provide an embedding of the input data and cannot extend naturally to new data points. Autoencoders have also become popular for representation learning. While they naturally compute feature extractors that are extendable to new data and invertible (i.e., reconstructing original features from latent representation), they often fail at representing the intrinsic data geometry compared to kernel-based manifold learning. We present a new method for integrating both approaches by incorporating a geometric regularization term in the bottleneck of the autoencoder. This regularization encourages the learned latent representation to follow the intrinsic data geometry, similar to manifold learning algorithms, while still enabling faithful extension to new data and preserving invertibility. We compare our approach to autoencoder models for manifold learning to provide qualitative and quantitative evidence of our advantages in preserving intrinsic structure, out of sample extension, and reconstruction. Our method is easily implemented for big-data applications, whereas other methods are limited in this regard.
A well-nourished workforce is instrumental in eradicating hunger, alleviating poverty, and spurring economic growth. A fifth of the total workforce in high-income countries are migrant workers. ...Despite the accessibility of nutritious foods in high-income countries, migrant workers often rely on nutrient-poor diets largely consisting of empty calories, which in turn leads to vitamin and mineral deficiency, also called hidden hunger, and resultant productivity loss. Here, we study the magnitude of hidden hunger in male migrant construction workers in Singapore and investigate the impact of consuming fortified rice for 6 consecutive months on the nutrition and health status of these workers.
140 male migrant workers aged 20-51 years of either Bangladeshi or Indian ethnicity from a single dormitory in Singapore volunteered to participate in the study. In total, 133 blood samples were taken at the start of the study and were used to assess vitamin B12, hemoglobin, ferritin, folate, and zinc levels; a sub-sample underwent for homocysteine testing. Anthropometric measurements and vital signs, such as blood pressure, were recorded before and after the intervention.
The results show that vitamin and mineral deficiency was present, especially folate (59% of workers deficient) and vitamin B12 (7% deficient, 31% marginally deficient). The consumption of fortified rice significantly improved the vitamin, iron and zinc level in the workers and significantly reduced the systolic blood pressure amongst the Bangladeshi migrant workers, specifically.
Our study demonstrates that fortified rice may have a positive impact on male migrant construction worker health and nutrition status at the workplace.
Neuropil is a fundamental form of tissue organization within the brain
, in which densely packed neurons synaptically interconnect into precise circuit architecture
. However, the structural and ...developmental principles that govern this nanoscale precision remain largely unknown
. Here we use an iterative data coarse-graining algorithm termed 'diffusion condensation'
to identify nested circuit structures within the Caenorhabditis elegans neuropil, which is known as the nerve ring. We show that the nerve ring neuropil is largely organized into four strata that are composed of related behavioural circuits. The stratified architecture of the neuropil is a geometrical representation of the functional segregation of sensory information and motor outputs, with specific sensory organs and muscle quadrants mapping onto particular neuropil strata. We identify groups of neurons with unique morphologies that integrate information across strata and that create neural structures that cage the strata within the nerve ring. We use high resolution light-sheet microscopy
coupled with lineage-tracing and cell-tracking algorithms
to resolve the developmental sequence and reveal principles of cell position, migration and outgrowth that guide stratified neuropil organization. Our results uncover conserved structural design principles that underlie the architecture and function of the nerve ring neuropil, and reveal a temporal progression of outgrowth-based on pioneer neurons-that guides the hierarchical development of the layered neuropil. Our findings provide a systematic blueprint for using structural and developmental approaches to understand neuropil organization within the brain.
Diesel exhaust particles (DEPs) are major constituents of air pollution and associated with numerous oxidative stress-induced human diseases. In vitro toxicity studies are useful for developing a ...better understanding of species-specific in vivo conditions. Conventional in vitro assessments based on oxidative biomarkers are destructive and inefficient. In this study, Raman spectroscopy, as a non-invasive imaging tool, was used to capture the molecular fingerprints of overall cellular component responses (nucleic acid, lipids, proteins, carbohydrates) to DEP damage and antioxidant protection. We apply a novel data visualization algorithm called PHATE, which preserves both global and local structure, to display the progression of cell damage over DEP exposure time. Meanwhile, a mutual information (MI) estimator was used to identify the most informative Raman peaks associated with cytotoxicity. A health index was defined to quantitatively assess the protective effects of two antioxidants (resveratrol and mesobiliverdin IXα) against DEP induced cytotoxicity. In addition, a number of machine learning classifiers were applied to successfully discriminate different treatment groups with high accuracy. Correlations between Raman spectra and immunomodulatory cytokine and chemokine levels were evaluated. In conclusion, the combination of label-free, non-disruptive Raman micro-spectroscopy and machine learning analysis is demonstrated as a useful tool in quantitative analysis of oxidative stress induced cytotoxicity and for effectively assessing various antioxidant treatments, suggesting that this framework can serve as a high throughput platform for screening various potential antioxidants based on their effectiveness at battling the effects of air pollution on human health.
Display omitted
•Apply new algorithms (PHATE and MI) to visualize the Raman spectral data.•Raman spectroscopy was utilized to monitor cellular responses to oxidative stress.•The health index was proposed to quantitatively assess antioxidants protection.•A number of machine learning algorithms were applied to analyze Raman spectral data.•Correlation between Raman spectra and cytokine level was analyzed.
Recent work has focused on the problem of nonparametric estimation of information divergence functionals between two continuous random variables. Many existing approaches require either restrictive ...assumptions about the density support set or difficult calculations at the support set boundary which must be known a priori. The mean squared error (MSE) convergence rate of a leave-one-out kernel density plug-in divergence functional estimator for general bounded density support sets is derived where knowledge of the support boundary, and therefore, the boundary correction is not required. The theory of optimally weighted ensemble estimation is generalized to derive a divergence estimator that achieves the parametric rate when the densities are sufficiently smooth. Guidelines for the tuning parameter selection and the asymptotic distribution of this estimator are provided. Based on the theory, an empirical estimator of Rényi-α divergence is proposed that greatly outperforms the standard kernel density plug-in estimator in terms of mean squared error, especially in high dimensions. The estimator is shown to be robust to the choice of tuning parameters. We show extensive simulation results that verify the theoretical results of our paper. Finally, we apply the proposed estimator to estimate the bounds on the Bayes error rate of a cell classification problem.
We developed a hyperspectral imaging tool based on surface-enhanced Raman spectroscopy (SERS) probes to determine the expression level and visualize the distribution of PD-L1 in individual cells. ...Electron-microscopic analysis of PD-L1 antibody - gold nanorod conjugates demonstrated binding the cell surface and internalization into endosomal vesicles. Stimulation of cells with IFN-γ or metformin was used to confirm the ability of SERS probes to report treatment-induced changes. The multivariate curve resolution-alternating least squares (MCR-ALS) analysis of spectra provided a greater signal-noise ratio than single peak mapping. However, single peak mapping allowed a systematic subtraction of background and the removal of non-specific binding and endocytic SERS signals. The mean or maximum peak height in the cell or the mean peak height in the area of specific PD-L1 positive pixels was used to estimate the PD-L1 expression levels in single cells. The PD-L1 levels were significantly up-regulated by IFN-γ and inhibited by metformin in human lung cancer cells from the A549 cell line. In conclusion, the method of analyzing hyperspectral SERS imaging data together with systematic and comprehensive removal of non-specific signals allows SERS imaging to be a quantitative tool in the detection of the cancer biomarker, PD-L1.