Single-cell transcriptome measurements can reveal unexplored biological diversity, but they suffer from technical noise and bias that must be modeled to account for the resulting uncertainty in ...downstream analyses. Here we introduce single-cell variational inference (scVI), a ready-to-use scalable framework for the probabilistic representation and analysis of gene expression in single cells ( https://github.com/YosefLab/scVI ). scVI uses stochastic optimization and deep neural networks to aggregate information across similar cells and genes and to approximate the distributions that underlie observed expression values, while accounting for batch effects and limited sensitivity. We used scVI for a range of fundamental analysis tasks including batch correction, visualization, clustering, and differential expression, and achieved high accuracy for each task.
A
bstract
The recent measurement of ∆
A
CP
by the LHCb collaboration requires an 𝒪 (10) enhancement coming from hadronic physics in order to be explained within the SM. We examine to what extent can ...NP models explain ∆
A
CP
without such enhancements. We discuss the implications in terms of a low energy effective theory as well as in the context of several explicit NP models.
Abstract
Motivation
Single-cell RNA-seq makes possible the investigation of variability in gene expression among cells, and dependence of variation on cell type. Statistical inference methods for ...such analyses must be scalable, and ideally interpretable.
Results
We present an approach based on a modification of a recently published highly scalable variational autoencoder framework that provides interpretability without sacrificing much accuracy. We demonstrate that our approach enables identification of gene programs in massive datasets. Our strategy, namely the learning of factor models with the auto-encoding variational Bayes framework, is not domain specific and may be useful for other applications.
Availability and implementation
The factor model is available in the scVI package hosted at https://github.com/YosefLab/scVI/.
Contact
v@nxn.se
Supplementary information
Supplementary data are available at Bioinformatics online.
A key challenge in the emerging field of single-cell RNA-Seq is to characterize phenotypic diversity between cells and visualize this information in an informative manner. A common technique when ...dealing with high-dimensional data is to project the data to 2 or 3 dimensions for visualization. However, there are a variety of methods to achieve this result and once projected, it can be difficult to ascribe biological significance to the observed features. Additionally, when analyzing single-cell data, the relationship between cells can be obscured by technical confounders such as variable gene capture rates.
To aid in the analysis and interpretation of single-cell RNA-Seq data, we have developed FastProject, a software tool which analyzes a gene expression matrix and produces a dynamic output report in which two-dimensional projections of the data can be explored. Annotated gene sets (referred to as gene 'signatures') are incorporated so that features in the projections can be understood in relation to the biological processes they might represent. FastProject provides a novel method of scoring each cell against a gene signature so as to minimize the effect of missed transcripts as well as a method to rank signature-projection pairings so that meaningful associations can be quickly identified. Additionally, FastProject is written with a modular architecture and designed to serve as a platform for incorporating and comparing new projection methods and gene selection algorithms.
Here we present FastProject, a software package for two-dimensional visualization of single cell data, which utilizes a plethora of projection methods and provides a way to systematically investigate the biological relevance of these low dimensional representations by incorporating domain knowledge.
The paired measurement of RNA and surface proteins in single cells with cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) is a promising approach to connect transcriptional ...variation with cell phenotypes and functions. However, combining these paired views into a unified representation of cell state is made challenging by the unique technical characteristics of each measurement. Here we present Total Variational Inference (totalVI; https://scvi-tools.org ), a framework for end-to-end joint analysis of CITE-seq data that probabilistically represents the data as a composite of biological and technical factors, including protein background and batch effects. To evaluate totalVI's performance, we profiled immune cells from murine spleen and lymph nodes with CITE-seq, measuring over 100 surface proteins. We demonstrate that totalVI provides a cohesive solution for common analysis tasks such as dimensionality reduction, the integration of datasets with different measured proteins, estimation of correlations between molecules and differential expression testing.
Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, ...limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.
A
bstract
We explore the implications of the Standard Model effective field theory (SMEFT) with dimension-six terms involving the Higgs boson and third-generation fermion fields on the rate of Higgs ...boson production and decay into fermions, on the electric dipole moments (EDMs) of the electron, and on the baryon asymmetry of the Universe. We study the consequences of allowing these additional terms for each flavor separately and for combinations of two flavors. We find that a complex
τ
Yukawa coupling can account for the observed baryon asymmetry
Y
B
obs
within current LHC and EDM bounds. A complex
b
(
t
) Yukawa coupling can account for 4% (2%) of
Y
B
obs
, whereas a combination of the two can reach 12%. Combining
τ
with either
t
or
b
enlarges the viable parameter space owing to cancellations in the EDM and in either Higgs production times decay or the total Higgs width, respectively. Interestingly, in such a scenario there exists a region in parameter space where the SMEFT contributions to the electron EDM cancel and collider signal strengths are precisely SM-like, while producing sufficient baryon asymmetry. Measuring
C P
violation in Higgs decays to
τ
leptons is the smoking gun for this scenario.
The abundance of new computational methods for processing and interpreting transcriptomes at a single cell level raises the need for in silico platforms for evaluation and validation. Here, we ...present SymSim, a simulator that explicitly models the processes that give rise to data observed in single cell RNA-Seq experiments. The components of the SymSim pipeline pertain to the three primary sources of variation in single cell RNA-Seq data: noise intrinsic to the process of transcription, extrinsic variation indicative of different cell states (both discrete and continuous), and technical variation due to low sensitivity and measurement noise and bias. We demonstrate how SymSim can be used for benchmarking methods for clustering, differential expression and trajectory inference, and for examining the effects of various parameters on their performance. We also show how SymSim can be used to evaluate the number of cells required to detect a rare population under various scenarios.
A
bstract
There is now experimental evidence for Higgs boson decay into a pair of muons, and significant constraints on the Higgs boson decay into a charm quark-antiquark pair. The data on Higgs ...boson decays into second generation fermions probes various extensions of the Standard Model. We analyze the implications for the Standard Model effective field theory (SMEFT), without and with minimal flavor violation (MFV), for two Higgs doublet models (2HDM) with natural flavor conservation (NFC), for models with vector-like fermions, and for specific models that predict significant modifications of the Yukawa couplings to the light generations.