Single-cell transcriptomic assays have enabled the de novo reconstruction of lineage differentiation trajectories, along with the characterization of cellular heterogeneity and state transitions. ...Several methods have been developed for reconstructing developmental trajectories from single-cell transcriptomic data, but efforts on analyzing single-cell epigenomic data and on trajectory visualization remain limited. Here we present STREAM, an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. We have tested STREAM on several synthetic and real datasets generated with different single-cell technologies. We further demonstrate its utility for understanding myoblast differentiation and disentangling known heterogeneity in hematopoiesis for different organisms. STREAM is an open-source software package.
A subset of Cancer-Associated Fibroblasts (FAP+/CAF-S1) mediates immunosuppression in breast cancers (BC), but its heterogeneity and its impact on immunotherapy response remain unknown. Here, we ...identify 8 CAF-S1 clusters by analyzing more than 19000 single CAF-S1 fibroblasts from BC. We validate the 5 most abundant clusters by flow cytometry and in silico analyses in other cancer types, highlighting their relevance. Myofibroblasts from clusters 0 and 3, characterized by extra-cellular matrix proteins and TGFB signaling respectively, are indicative of primary resistance to immunotherapies. Cluster 0/ecm-myCAF up-regulates PD-1 and CTLA-4 protein levels in regulatory T lymphocytes (Tregs), which in turn increases CAF-S1 cluster 3/TGFB-myCAF cellular content. Thus, our study highlights a positive feedback loop between specific CAF-S1 clusters and Tregs and uncovers their role in immunotherapy resistance.
Exploring the function or the developmental history of cells in various organisms provides insights into a given cell type's core molecular characteristics and putative evolutionary mechanisms. ...Numerous computational methods now exist for analyzing single-cell data and identifying cell states. These methods mostly rely on the expression of genes considered as markers for a given cell state. Yet, there is a lack of scRNA-seq computational tools to study the evolution of cell states, particularly how cell states change their molecular profiles. This can include novel gene activation or the novel deployment of programs already existing in other cell types, known as co-option.
Here we present scEvoNet, a Python tool for predicting cell type evolution in cross-species or cancer-related scRNA-seq datasets. ScEvoNet builds the confusion matrix of cell states and a bipartite network connecting genes and cell states. It allows a user to obtain a set of genes shared by the characteristic signature of two cell states even between distantly-related datasets. These genes can be used as indicators of either evolutionary divergence or co-option occurring during organism or tumor evolution. Our results on cancer and developmental datasets indicate that scEvoNet is a helpful tool for the initial screening of such genes as well as for measuring cell state similarities.
The scEvoNet package is implemented in Python and is freely available from https://github.com/monsoro/scEvoNet . Utilizing this framework and exploring the continuum of transcriptome states between developmental stages and species will help explain cell state dynamics.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Relationships between genetic alterations, such as co-occurrence or mutual exclusivity, are often observed in cancer, where their understanding may provide new insights into etiology and clinical ...management. In this study, we combined statistical analyses and computational modeling to explain patterns of genetic alterations seen in 178 patients with bladder tumors (either muscle-invasive or non-muscle-invasive). A statistical analysis on frequently altered genes identified pair associations, including co-occurrence or mutual exclusivity. Focusing on genetic alterations of protein-coding genes involved in growth factor receptor signaling, cell cycle, and apoptosis entry, we complemented this analysis with a literature search to focus on nine pairs of genetic alterations of our dataset, with subsequent verification in three other datasets available publicly. To understand the reasons and contexts of these patterns of associations while accounting for the dynamics of associated signaling pathways, we built a logical model. This model was validated first on published mutant mice data, then used to study patterns and to draw conclusions on counter-intuitive observations, allowing one to formulate predictions about conditions where combining genetic alterations benefits tumorigenesis. For example, while CDKN2A homozygous deletions occur in a context of FGFR3-activating mutations, our model suggests that additional PIK3CA mutation or p21CIP deletion would greatly favor invasiveness. Furthermore, the model sheds light on the temporal orders of gene alterations, for example, showing how mutual exclusivity of FGFR3 and TP53 mutations is interpretable if FGFR3 is mutated first. Overall, our work shows how to predict combinations of the major gene alterations leading to invasiveness through two main progression pathways in bladder cancer.
Understanding the etiology of metastasis is very important in clinical perspective, since it is estimated that metastasis accounts for 90% of cancer patient mortality. Metastasis results from a ...sequence of multiple steps including invasion and migration. The early stages of metastasis are tightly controlled in normal cells and can be drastically affected by malignant mutations; therefore, they might constitute the principal determinants of the overall metastatic rate even if the later stages take long to occur. To elucidate the role of individual mutations or their combinations affecting the metastatic development, a logical model has been constructed that recapitulates published experimental results of known gene perturbations on local invasion and migration processes, and predict the effect of not yet experimentally assessed mutations. The model has been validated using experimental data on transcriptome dynamics following TGF-β-dependent induction of Epithelial to Mesenchymal Transition in lung cancer cell lines. A method to associate gene expression profiles with different stable state solutions of the logical model has been developed for that purpose. In addition, we have systematically predicted alleviating (masking) and synergistic pairwise genetic interactions between the genes composing the model with respect to the probability of acquiring the metastatic phenotype. We focused on several unexpected synergistic genetic interactions leading to theoretically very high metastasis probability. Among them, the synergistic combination of Notch overexpression and p53 deletion shows one of the strongest effects, which is in agreement with a recent published experiment in a mouse model of gut cancer. The mathematical model can recapitulate experimental mutations in both cell line and mouse models. Furthermore, the model predicts new gene perturbations that affect the early steps of metastasis underlying potential intervention points for innovative therapeutic strategies in oncology.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Boolean networks are largely employed to model the qualitative dynamics of cell fate processes by describing the change of binary activation states of genes and transcription factors with time. Being ...able to bridge such qualitative states with quantitative measurements of gene expression in cells, as scRNA-seq, is a cornerstone for data-driven model construction and validation. On one hand, scRNA-seq binarisation is a key step for inferring and validating Boolean models. On the other hand, the generation of synthetic scRNA-seq data from baseline Boolean models provides an important asset to benchmark inference methods. However, linking characteristics of scRNA-seq datasets, including dropout events, with Boolean states is a challenging task.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Machine learning deals with datasets characterized by high dimensionality. However, in many cases, the intrinsic dimensionality of the datasets is surprisingly low. For example, the dimensionality of ...a robot's perception space can be large and multi-modal but its variables can have more or less complex non-linear interdependencies. Thus multidimensional data point clouds can be effectively located in the vicinity of principal varieties possessing locally small dimensionality, but having a globally complicated organization which is sometimes difficult to represent with regular mathematical objects (such as manifolds). We review modern machine learning approaches for extracting low-dimensional geometries from multi-dimensional data and their applications in various scientific fields.
The lack of integrated resources depicting the complexity of the innate immune response in cancer represents a bottleneck for high-throughput data interpretation. To address this challenge, we ...perform a systematic manual literature mining of molecular mechanisms governing the innate immune response in cancer and represent it as a signalling network map. The cell-type specific signalling maps of macrophages, dendritic cells, myeloid-derived suppressor cells and natural killers are constructed and integrated into a comprehensive meta map of the innate immune response in cancer. The meta-map contains 1466 chemical species as nodes connected by 1084 biochemical reactions, and it is supported by information from 820 articles. The resource helps to interpret single cell RNA-Seq data from macrophages and natural killer cells in metastatic melanoma that reveal different anti- or pro-tumor sub-populations within each cell type. Here, we report a new open source analytic platform that supports data visualisation and interpretation of tumour microenvironment activity in cancer.
Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the ...purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces scikit-dimension, an open-source Python package for intrinsic dimension estimation. The scikit-dimension package provides a uniform implementation of most of the known ID estimators based on the scikit-learn application programming interface to evaluate the global and local intrinsic dimension, as well as generators of synthetic toy and benchmark datasets widespread in the literature. The package is developed with tools assessing the code quality, coverage, unit testing and continuous integration. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation for real-life and synthetic data.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Independent Component Analysis (ICA) is a method that models gene expression data as an action of a set of statistically independent hidden factors. The output of ICA depends on a fundamental ...parameter: the number of components (factors) to compute. The optimal choice of this parameter, related to determining the effective data dimension, remains an open question in the application of blind source separation techniques to transcriptomic data.
Here we address the question of optimizing the number of statistically independent components in the analysis of transcriptomic data for reproducibility of the components in multiple runs of ICA (within the same or within varying effective dimensions) and in multiple independent datasets. To this end, we introduce ranking of independent components based on their stability in multiple ICA computation runs and define a distinguished number of components (Most Stable Transcriptome Dimension, MSTD) corresponding to the point of the qualitative change of the stability profile. Based on a large body of data, we demonstrate that a sufficient number of dimensions is required for biological interpretability of the ICA decomposition and that the most stable components with ranks below MSTD have more chances to be reproduced in independent studies compared to the less stable ones. At the same time, we show that a transcriptomics dataset can be reduced to a relatively high number of dimensions without losing the interpretability of ICA, even though higher dimensions give rise to components driven by small gene sets.
We suggest a protocol of ICA application to transcriptomics data with a possibility of prioritizing components with respect to their reproducibility that strengthens the biological interpretation. Computing too few components (much less than MSTD) is not optimal for interpretability of the results. The components ranked within MSTD range have more chances to be reproduced in independent studies.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK