Technological advances have enabled the profiling of multiple molecular layers at single-cell resolution, assaying cells from multiple samples or conditions. Consequently, there is a growing need for ...computational strategies to analyze data from complex experimental designs that include multiple data modalities and multiple groups of samples. We present Multi-Omics Factor Analysis v2 (MOFA+), a statistical framework for the comprehensive and scalable integration of single-cell multi-modal data. MOFA+ reconstructs a low-dimensional representation of the data using computationally efficient variational inference and supports flexible sparsity constraints, allowing to jointly model variation across multiple sample groups and data modalities.
Multi‐omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets ...are lacking. We present Multi‐Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi‐omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and ex vivo drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy‐chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single‐cell multi‐omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation.
Synopsis
Multi‐Omics Factor Analysis (MOFA) is a computational framework for unsupervised discovery of the principal axes of biological and technical variation when multiple omics assays are applied to the same samples. MOFA is a broadly applicable approach for multi‐omics data integration.
The inferred latent factors represent the underlying principal axes of heterogeneity across the samples. Factors can be shared by multiple data modalities or can be data‐type specific.
The model flexibly handles missing values and different data types.
In an application to Chronic Lymphocytic Leukaemia, MOFA discovers a low dimensional space spanned by known clinical markers and underappreciated axes of variation such as oxidative stress.
In an application to multi‐omics profiles from single‐cells, MOFA recovers differentiation trajectories and identifies coordinated variation between the transcriptome and the epigenome.
Multi‐Omics Factor Analysis (MOFA) is a computational framework for unsupervised discovery of the principal axes of biological and technical variation when multiple omics assays are applied to the same samples. MOFA is a broadly applicable approach for multi‐omics data integration.
Factor analysis is a widely used method for dimensionality reduction in genome biology, with applications from personalized health to single-cell biology. Existing factor analysis models assume ...independence of the observed samples, an assumption that fails in spatio-temporal profiling studies. Here we present MEFISTO, a flexible and versatile toolbox for modeling high-dimensional data when spatial or temporal dependencies between the samples are known. MEFISTO maintains the established benefits of factor analysis for multimodal data, but enables the performance of spatio-temporally informed dimensionality reduction, interpolation, and separation of smooth from non-smooth patterns of variation. Moreover, MEFISTO can integrate multiple related datasets by simultaneously identifying and aligning the underlying patterns of variation in a data-driven manner. To illustrate MEFISTO, we apply the model to different datasets with spatial or temporal resolution, including an evolutionary atlas of organ development, a longitudinal microbiome study, a single-cell multi-omics atlas of mouse gastrulation and spatially resolved transcriptomics.
Recent high-throughput transcription factor (TF) binding assays revealed that TF cooperativity is a widespread phenomenon. However, a global mechanistic and functional understanding of TF ...cooperativity is still lacking. To address this, here we introduce a statistical learning framework that provides structural insight into TF cooperativity and its functional consequences based on next generation sequencing data. We identify DNA shape as driver for cooperativity, with a particularly strong effect for Forkhead-Ets pairs. Follow-up experiments reveal a local shape preference at the Ets-DNA-Forkhead interface and decreased cooperativity upon loss of the interaction. Additionally, we discover many functional associations for cooperatively bound TFs. Examination of the link between FOXO1:ETV6 and lymphomas reveals that their joint expression levels improve patient clinical outcome stratification. Altogether, our results demonstrate that inter-family cooperative TF binding is driven by position-specific DNA readout mechanisms, which provides an additional regulatory layer for downstream biological functions.
Biomedical single-cell atlases describe disease at the cellular level. However, analysis of this data commonly focuses on cell-type-centric pairwise cross-condition comparisons, disregarding the ...multicellular nature of disease processes. Here, we propose multicellular factor analysis for the unsupervised analysis of samples from cross-condition single-cell atlases and the identification of multicellular programs associated with disease. Our strategy, which repurposes group factor analysis as implemented in multi-omics factor analysis, incorporates the variation of patient samples across cell-types or other tissue-centric features, such as cell compositions or spatial relationships, and enables the joint analysis of multiple patient cohorts, facilitating the integration of atlases. We applied our framework to a collection of acute and chronic human heart failure atlases and described multicellular processes of cardiac remodeling, independent to cellular compositions and their local organization, that were conserved in independent spatial and bulk transcriptomics datasets. In sum, our framework serves as an exploratory tool for unsupervised analysis of cross-condition single-cell atlases and allows for the integration of the measurements of patient cohorts across distinct data modalities.
Individuals with PhDs and postdoctoral experience in the life sciences can pursue a variety of career paths. Many PhD students and postdocs aspire to a permanent research position at a university or ...research institute, but competition for such positions has increased. Here, we report a time-resolved analysis of the career paths of 2284 researchers who completed a PhD or a postdoc at the European Molecular Biology Laboratory (EMBL) between 1997 and 2020. The most prevalent career outcome was Academia: Principal Investigator (636/2284=27.8% of alumni), followed by Academia: Other (16.8%), Science-related Non-research (15.3%), Industry Research (14.5%), Academia: Postdoc (10.7%) and Non-science-related (4%); we were unable to determine the career path of the remaining 10.9% of alumni. While positions in Academia (Principal Investigator, Postdoc and Other) remained the most common destination for more recent alumni, entry into Science-related Non-research, Industry Research and Non-science-related positions has increased over time, and entry into Academia: Principal Investigator positions has decreased. Our analysis also reveals information on a number of factors – including publication records – that correlate with the career paths followed by researchers.
The relationship between the human placenta-the extraembryonic organ made by the fetus, and the decidua-the mucosal layer of the uterus, is essential to nurture and protect the fetus during ...pregnancy. Extravillous trophoblast cells (EVTs) derived from placental villi infiltrate the decidua, transforming the maternal arteries into high-conductance vessels
. Defects in trophoblast invasion and arterial transformation established during early pregnancy underlie common pregnancy disorders such as pre-eclampsia
. Here we have generated a spatially resolved multiomics single-cell atlas of the entire human maternal-fetal interface including the myometrium, which enables us to resolve the full trajectory of trophoblast differentiation. We have used this cellular map to infer the possible transcription factors mediating EVT invasion and show that they are preserved in in vitro models of EVT differentiation from primary trophoblast organoids
and trophoblast stem cells
. We define the transcriptomes of the final cell states of trophoblast invasion: placental bed giant cells (fused multinucleated EVTs) and endovascular EVTs (which form plugs inside the maternal arteries). We predict the cell-cell communication events contributing to trophoblast invasion and placental bed giant cell formation, and model the dual role of interstitial EVTs and endovascular EVTs in mediating arterial transformation during early pregnancy. Together, our data provide a comprehensive analysis of postimplantation trophoblast differentiation that can be used to inform the design of experimental models of the human placenta in early pregnancy.
Drug combinations that target critical pathways are a mainstay of cancer care. To improve current approaches to combination treatment of chronic lymphocytic leukemia (CLL) and gain insights into the ...underlying biology, we studied the effect of 352 drug combination pairs in multiple concentrations by analysing ex vivo drug response of 52 primary CLL samples, which were characterized by "omics" profiling. Known synergistic interactions were confirmed for B-cell receptor (BCR) inhibitors with Bcl-2 inhibitors and with chemotherapeutic drugs, suggesting that this approach can identify clinically useful combinations. Moreover, we uncovered synergistic interactions between BCR inhibitors and afatinib, which we attribute to BCR activation by afatinib through BLK upstream of BTK and PI3K. Combinations of multiple inhibitors of BCR components (e.g., BTK, PI3K, SYK) had effects similar to the single agents. While PI3K and BTK inhibitors produced overall similar effects in combinations with other drugs, we uncovered a larger response heterogeneity of combinations including PI3K inhibitors, predominantly in CLL with mutated IGHV, which we attribute to the target's position within the BCR-signaling pathway. Taken together, our study shows that drug combination effects can be effectively queried in primary cancer cells, which could aid discovery, triage and clinical development of drug combinations.
Penalization schemes like Lasso or ridge regression are routinely used to regress a response of interest on a high-dimensional set of potential predictors. Despite being decisive, the question of the ...relative strength of penalization is often glossed over and only implicitly determined by the scale of individual predictors. At the same time, additional information on the predictors is available in many applications but left unused. Here, we propose to make use of such external covariates to adapt the penalization in a data-driven manner. We present a method that differentially penalizes feature groups defined by the covariates and adapts the relative strength of penalization to the information content of each group. Using techniques from the Bayesian tool-set our procedure combines shrinkage with feature selection and provides a scalable optimization scheme. We demonstrate in simulations that the method accurately recovers the true effect sizes and sparsity patterns per feature group. Furthermore, it leads to an improved prediction performance in situations where the groups have strong differences in dynamic range. In applications to data from high-throughput biology, the method enables re-weighting the importance of feature groups from different assays. Overall, using available covariates extends the range of applications of penalized regression, improves model interpretability and can improve prediction performance.
Abstract
Motivation
Factor analysis is a widely used tool for unsupervised dimensionality reduction of high-throughput datasets in molecular biology, with recently proposed extensions designed ...specifically for spatial transcriptomics data. However, these methods expect (count) matrices as data input and are therefore not directly applicable to single molecule resolution data, which are in the form of coordinate lists annotated with genes and provide insight into subcellular spatial expression patterns. To address this, we here propose FISHFactor, a probabilistic factor model that combines the benefits of spatial, non-negative factor analysis with a Poisson point process likelihood to explicitly model and account for the nature of single molecule resolution data. In addition, FISHFactor shares information across a potentially large number of cells in a common weight matrix, allowing consistent interpretation of factors across cells and yielding improved latent variable estimates.
Results
We compare FISHFactor to existing methods that rely on aggregating information through spatial binning and cannot combine information from multiple cells and show that our method leads to more accurate results on simulated data. We show that our method is scalable and can be readily applied to large datasets. Finally, we demonstrate on a real dataset that FISHFactor is able to identify major subcellular expression patterns and spatial gene clusters in a data-driven manner.
Availability and implementation
The model implementation, data simulation and experiment scripts are available under https://www.github.com/bioFAM/FISHFactor.