The similarity or distance measure used for clustering can generate intuitive and interpretable clusters when it is tailored to the unique characteristics of the data. In time series datasets ...generated with high-throughput biological assays, measurements such as gene expression levels or protein phosphorylation intensities are collected sequentially over time, and the similarity score should capture this special temporal structure.
We propose a clustering similarity measure called Lag Penalized Weighted Correlation (LPWC) to group pairs of time series that exhibit closely-related behaviors over time, even if the timing is not perfectly synchronized. LPWC aligns time series profiles to identify common temporal patterns. It down-weights aligned profiles based on the length of the temporal lags that are introduced. We demonstrate the advantages of LPWC versus existing time series and general clustering algorithms. In a simulated dataset based on the biologically-motivated impulse model, LPWC is the only method to recover the true clusters for almost all simulated genes. LPWC also identifies clusters with distinct temporal patterns in our yeast osmotic stress response and axolotl limb regeneration case studies.
LPWC achieves both of its time series clustering goals. It groups time series with correlated changes over time, even if those patterns occur earlier or later in some of the time series. In addition, it refrains from introducing large shifts in time when searching for temporal patterns by applying a lag penalty. The LPWC R package is available at https://github.com/gitter-lab/LPWC and CRAN under a MIT license.
Cellular gene expression changes throughout a dynamic biological process, such as differentiation. Pseudotimes estimate cells’ progress along a dynamic process based on their individual gene ...expression states. Ordering the expression data by pseudotime provides information about the underlying regulator-gene interactions. Because the pseudotime distribution is not uniform, many standard mathematical methods are inapplicable for analyzing the ordered gene expression states. Here we present single-cell inference of networks using Granger ensembles (SINGE), an algorithm for gene regulatory network inference from ordered single-cell gene expression data. SINGE uses kernel-based Granger causality regression to smooth irregular pseudotimes and missing expression values. It aggregates predictions from an ensemble of regression analyses to compile a ranked list of candidate interactions between transcriptional regulators and target genes. In two mouse embryonic stem cell differentiation datasets, SINGE outperforms other contemporary algorithms. However, a more detailed examination reveals caveats about poor performance for individual regulators and uninformative pseudotimes.
Display omitted
•Pseudotime estimates order cells in a dynamic process using single-cell gene expression•SINGE infers gene regulatory networks from gene expression trends over pseudotime•SINGE’s ensembling considers many smoothed versions of irregular pseudotemporal data•Uninformative pseudotime values can be detrimental to network reconstruction
Deshpande et al. present SINGE, an algorithm to infer gene regulatory networks from ordered single-cell gene expression data. SINGE uses kernel-based regression to smooth noisy, ordered single-cell data and ensembling to prioritize reliable regulatory relationships.
Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results ...across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems—patient classification, fundamental biological processes and treatment of patients—and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.
High-throughput, 'omic' methods provide sensitive measures of biological responses to perturbations. However, inherent biases in high-throughput assays make it difficult to interpret experiments in ...which more than one type of data is collected. In this work, we introduce Omics Integrator, a software package that takes a variety of 'omic' data as input and identifies putative underlying molecular pathways. The approach applies advanced network optimization algorithms to a network of thousands of molecular interactions to find high-confidence, interpretable subnetworks that best explain the data. These subnetworks connect changes observed in gene expression, protein abundance or other global assays to proteins that may not have been measured in the screens due to inherent bias or noise in measurement. This approach reveals unannotated molecular pathways that would not be detectable by searching pathway databases. Omics Integrator also provides an elegant framework to incorporate not only positive data, but also negative evidence. Incorporating negative evidence allows Omics Integrator to avoid unexpressed genes and avoid being biased toward highly-studied hub proteins, except when they are strongly implicated by the data. The software is comprised of two individual tools, Garnet and Forest, that can be run together or independently to allow a user to perform advanced integration of multiple types of high-throughput data as well as create condition-specific subnetworks of protein interactions that best connect the observed changes in various datasets. It is available at http://fraenkel.mit.edu/omicsintegrator and on GitHub at https://github.com/fraenkel-lab/OmicsIntegrator.
Accurate models of the cross-talk between signaling pathways and transcriptional regulatory networks within cells are essential to understand complex response programs. We present a new computational ...method that combines condition-specific time-series expression data with general protein interaction data to reconstruct dynamic and causal stress response networks. These networks characterize the pathways involved in the response, their time of activation, and the affected genes. The signaling and regulatory components of our networks are linked via a set of common transcription factors that serve as targets in the signaling network and as regulators of the transcriptional response network. Detailed case studies of stress responses in budding yeast demonstrate the predictive power of our method. Our method correctly identifies the core signaling proteins and transcription factors of the response programs. It further predicts the involvement of additional transcription factors and other proteins not previously implicated in the response pathways. We experimentally verify several of these predictions for the osmotic stress response network. Our approach requires little condition-specific data: only a partial set of upstream initiators and time-series gene expression data, which are readily available for many conditions and species. Consequently, our method is widely applicable and can be used to derive accurate, dynamic response models in several species.
Viruses must balance their reliance on host cell machinery for replication while avoiding host defense. Influenza A viruses are zoonotic agents that frequently switch hosts, causing localized ...outbreaks with the potential for larger pandemics. The host range of influenza virus is limited by the need for successful interactions between the virus and cellular partners. Here we used immunocompetitive capture-mass spectrometry to identify cellular proteins that interact with human- and avian-style viral polymerases. We focused on the proviral activity of heterogenous nuclear ribonuclear protein U-like 1 (hnRNP UL1) and the antiviral activity of mitochondrial enoyl CoA-reductase (MECR). MECR is localized to mitochondria where it functions in mitochondrial fatty acid synthesis (mtFAS). While a small fraction of the polymerase subunit PB2 localizes to the mitochondria, PB2 did not interact with full-length MECR. By contrast, a minor splice variant produces cytoplasmic MECR (cMECR). Ectopic expression of cMECR shows that it binds the viral polymerase and suppresses viral replication by blocking assembly of viral ribonucleoprotein complexes (RNPs). MECR ablation through genome editing or drug treatment is detrimental for cell health, creating a generic block to virus replication. Using the yeast homolog Etr1 to supply the metabolic functions of MECR in MECR-null cells, we showed that specific antiviral activity is independent of mtFAS and is reconstituted by expressing cMECR. Thus, we propose a strategy where alternative splicing produces a cryptic antiviral protein that is embedded within a key metabolic enzyme.
Open collaborative writing with Manubot Himmelstein, Daniel S; Rubinetti, Vincent; Slochower, David R ...
PLoS computational biology,
06/2019, Letnik:
15, Številka:
6
Journal Article
Recenzirano
Odprti dostop
Open, collaborative research is a powerful paradigm that can immensely strengthen the scientific process by integrating broad and diverse expertise. However, traditional research and multi-author ...writing processes break down at scale. We present new software named Manubot, available at https://manubot.org, to address the challenges of open scholarly writing. Manubot adopts the contribution workflow used by many large-scale open source software projects to enable collaborative authoring of scholarly manuscripts. With Manubot, manuscripts are written in Markdown and stored in a Git repository to precisely track changes over time. By hosting manuscript repositories publicly, such as on GitHub, multiple authors can simultaneously propose and review changes. A cloud service automatically evaluates proposed changes to catch errors. Publication with Manubot is continuous: When a manuscript's source changes, the rendered outputs are rebuilt and republished to a web page. Manubot automates bibliographic tasks by implementing citation by identifier, where users cite persistent identifiers (e.g. DOIs, PubMed IDs, ISBNs, URLs), whose metadata is then retrieved and converted to a user-specified style. Manubot modernizes publishing to align with the ideals of open science by making it transparent, reproducible, immediate, versioned, collaborative, and free of charge.
Kaposi's Sarcoma associated Herpesvirus (KSHV), an oncogenic, human gamma-herpesvirus, is the etiological agent of Kaposi's Sarcoma the most common tumor of AIDS patients world-wide. KSHV is ...predominantly latent in the main KS tumor cell, the spindle cell, a cell of endothelial origin. KSHV modulates numerous host cell-signaling pathways to activate endothelial cells including major metabolic pathways involved in lipid metabolism. To identify the underlying cellular mechanisms of KSHV alteration of host signaling and endothelial cell activation, we identified changes in the host proteome, phosphoproteome and transcriptome landscape following KSHV infection of endothelial cells. A Steiner forest algorithm was used to integrate the global data sets and, together with transcriptome based predicted transcription factor activity, cellular networks altered by latent KSHV were predicted. Several interesting pathways were identified, including peroxisome biogenesis. To validate the predictions, we showed that KSHV latent infection increases the number of peroxisomes per cell. Additionally, proteins involved in peroxisomal lipid metabolism of very long chain fatty acids, including ABCD3 and ACOX1, are required for the survival of latently infected cells. In summary, novel cellular pathways altered during herpesvirus latency that could not be predicted by a single systems biology platform, were identified by integrated proteomics and transcriptomics data analysis and when correlated with our metabolomics data revealed that peroxisome lipid metabolism is essential for KSHV latent infection of endothelial cells.
HIV-1 spreads efficiently through direct cell-to-cell transmission at virological synapses (VSs) formed by interactions between HIV-1 envelope proteins (Env) on the surface of infected cells and CD4 ...receptors on uninfected target cells. Env-CD4 interactions bring the infected and uninfected cellular membranes into close proximity and induce transport of viral and cellular factors to the VS for efficient virion assembly and HIV-1 transmission. Using novel, cell-specific stable isotope labeling and quantitative mass spectrometric proteomics, we identified extensive changes in the levels and phosphorylation states of proteins in HIV-1 infected producer cells upon mixing with CD4+ target cells under conditions inducing VS formation. These coculture-induced alterations involved multiple cellular pathways including transcription, TCR signaling and, unexpectedly, cell cycle regulation, and were dominated by Env-dependent responses. We confirmed the proteomic results using inhibitors targeting regulatory kinases and phosphatases in selected pathways identified by our proteomic analysis. Strikingly, inhibiting the key mitotic regulator Aurora kinase B (AURKB) in HIV-1 infected cells significantly increased HIV activity in cell-to-cell fusion and transmission but had little effect on cell-free infection. Consistent with this, we found that AURKB regulates the fusogenic activity of HIV-1 Env. In the Jurkat T cell line and primary T cells, HIV-1 Env:CD4 interaction also dramatically induced cell cycle-independent AURKB relocalization to the centromere, and this signaling required the long (150 aa) cytoplasmic C-terminal domain (CTD) of Env. These results imply that cytoplasmic/plasma membrane AURKB restricts HIV-1 envelope fusion, and that this restriction is overcome by Env CTD-induced AURKB relocalization. Taken together, our data reveal a new signaling pathway regulating HIV-1 cell-to-cell transmission and potential new avenues for therapeutic intervention through targeting the Env CTD and AURKB activity.
Embedded within large-scale protein interaction networks are signaling pathways that encode response cascades in the cell. Unfortunately, even for well-studied species like S. cerevisiae, only a ...fraction of all true protein interactions are known, which makes it difficult to reason about the exact flow of signals and the corresponding causal relations in the network. To help address this problem, we introduce a framework for predicting new interactions that aid connectivity between upstream proteins (sources) and downstream transcription factors (targets) of a particular pathway. Our algorithms attempt to globally minimize the distance between sources and targets by finding a small set of shortcut edges to add to the network. Unlike existing algorithms for predicting general protein interactions, by focusing on proteins involved in specific responses our approach homes-in on pathway-consistent interactions. We applied our method to extend pathways in osmotic stress response in yeast and identified several missing interactions, some of which are supported by published reports. We also performed experiments that support a novel interaction not previously reported. Our framework is general and may be applicable to edge prediction problems in other domains.