Machine learning, a collection of data-analytical techniques aimed at building predictive models from multi-dimensional datasets, is becoming integral to modern biological research. By enabling one ...to generate models that learn from large datasets and make predictions on likely outcomes, machine learning can be used to study complex cellular systems such as biological networks. Here, we provide a primer on machine learning for life scientists, including an introduction to deep learning. We discuss opportunities and challenges at the intersection of machine learning and network biology, which could impact disease biology, drug discovery, microbiome research, and synthetic biology.
Machine-learning approaches are essential for pulling information out of the vast datasets that are being collected across biology and biomedicine. This Review considers the opportunities and challenges at the intersection of network biology and data science.
Abstract
Motivation
Gene Set Enrichment Analysis (GSEA) is routinely used to analyze and interpret coordinate pathway-level changes in transcriptomics experiments. For an experiment where less than ...seven samples per condition are compared, GSEA employs a competitive null hypothesis to test significance. A gene set enrichment score is tested against a null distribution of enrichment scores generated from permuted gene sets, where genes are randomly selected from the input experiment. Looking across a variety of biological conditions, however, genes are not randomly distributed with many showing consistent patterns of up- or down-regulation. As a result, common patterns of positively and negatively enriched gene sets are observed across experiments. Placing a single experiment into the context of a relevant set of background experiments allows us to identify both the common and experiment-specific patterns of gene set enrichment.
Results
We compiled a compendium of 442 small molecule transcriptomic experiments and used GSEA to characterize common patterns of positively and negatively enriched gene sets. To identify experiment-specific gene set enrichment, we developed the GSEA-InContext method that accounts for gene expression patterns within a background set of experiments to identify statistically significantly enriched gene sets. We evaluated GSEA-InContext on experiments using small molecules with known targets to show that it successfully prioritizes gene sets that are specific to each experiment, thus providing valuable insights that complement standard GSEA analysis.
Availability and implementation
GSEA-InContext implemented in Python, Supplementary results and the background expression compendium are available at: https://github.com/CostelloLab/GSEA-InContext.
Mechanical breathing motions have a fundamental function in lung development and disease, but little is known about how they contribute to host innate immunity. Here we use a human lung alveolus chip ...that experiences cyclic breathing-like deformations to investigate whether physical forces influence innate immune responses to viral infection. Influenza H3N2 infection of mechanically active chips induces a cascade of host responses including increased lung permeability, apoptosis, cell regeneration, cytokines production, and recruitment of circulating immune cells. Comparison with static chips reveals that breathing motions suppress viral replication by activating protective innate immune responses in epithelial and endothelial cells, which are mediated in part through activation of the mechanosensitive ion channel TRPV4 and signaling via receptor for advanced glycation end products (RAGE). RAGE inhibitors suppress cytokines induction, while TRPV4 inhibition attenuates both inflammation and viral burden, in infected chips with breathing motions. Therefore, TRPV4 and RAGE may serve as new targets for therapeutic intervention in patients infected with influenza and other potential pandemic viruses that cause life-threatening lung inflammation.
Glycans, the most diverse biopolymer, are shaped by evolutionary pressures stemming from host-microbe interactions. Here, we present machine learning and bioinformatics methods to leverage the ...evolutionary information present in glycans to gain insights into how pathogens and commensals interact with hosts. By using techniques from natural language processing, we develop deep-learning models for glycans that are trained on a curated dataset of 19,299 unique glycans and can be used to study and predict glycan functions. We show that these models can be utilized to predict glycan immunogenicity and the pathogenicity of bacterial strains, as well as investigate glycan-mediated immune evasion via molecular mimicry. We also develop glycan-alignment methods and use these to analyze virulence-determining glycan motifs in the capsular polysaccharides of bacterial pathogens. These resources enable one to identify and study glycan motifs involved in immunogenicity, pathogenicity, molecular mimicry, and immune evasion, expanding our understanding of host-microbe interactions.
Display omitted
•Glycan-focused language models can be used for sequence-to-function models•Information in glycans predicts immunogenicity, pathogenicity, and taxonomic origin•Glycan alignments shed light into bacterial virulence
Bojar et al. present a workflow that combines machine learning and bioinformatics techniques to analyze the prominent role of glycans in host-microbe interactions. The herein developed glycan-focused language models and alignments allow for the prediction and analysis of glycan immunogenicity, association with pathogenicity, and taxonomic classification.
Drug repurposing requires distinguishing established drug class targets from novel molecule-specific mechanisms and rapidly derisking their therapeutic potential in a time-critical manner, ...particularly in a pandemic scenario. In response to the challenge to rapidly identify treatment options for COVID-19, several studies reported that statins, as a drug class, reduce mortality in these patients. However, it is unknown if different statins exhibit consistent function or may have varying therapeutic benefit. A Bayesian network tool was used to predict drugs that shift the host transcriptomic response to SARS-CoV-2 infection towards a healthy state. Drugs were predicted using 14 RNA-sequencing datasets from 72 autopsy tissues and 465 COVID-19 patient samples or from cultured human cells and organoids infected with SARS-CoV-2. Top drug predictions included statins, which were then assessed using electronic medical records containing over 4,000 COVID-19 patients on statins to determine mortality risk in patients prescribed specific statins versus untreated matched controls. The same drugs were tested in Vero E6 cells infected with SARS-CoV-2 and human endothelial cells infected with a related OC43 coronavirus. Simvastatin was among the most highly predicted compounds (14/14 datasets) and five other statins, including atorvastatin, were predicted to be active in > 50% of analyses. Analysis of the clinical database revealed that reduced mortality risk was only observed in COVID-19 patients prescribed a subset of statins, including simvastatin and atorvastatin. In vitro testing of SARS-CoV-2 infected cells revealed simvastatin to be a potent direct inhibitor whereas most other statins were less effective. Simvastatin also inhibited OC43 infection and reduced cytokine production in endothelial cells. Statins may differ in their ability to sustain the lives of COVID-19 patients despite having a shared drug target and lipid-modifying mechanism of action. These findings highlight the value of target-agnostic drug prediction coupled with patient databases to identify and clinically evaluate non-obvious mechanisms and derisk and accelerate drug repurposing opportunities.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The rapid repurposing of antivirals is particularly pressing during pandemics. However, rapid assays for assessing candidate drugs typically involve in vitro screens and cell lines that do not ...recapitulate human physiology at the tissue and organ levels. Here we show that a microfluidic bronchial-airway-on-a-chip lined by highly differentiated human bronchial-airway epithelium and pulmonary endothelium can model viral infection, strain-dependent virulence, cytokine production and the recruitment of circulating immune cells. In airway chips infected with influenza A, the co-administration of nafamostat with oseltamivir doubled the treatment-time window for oseltamivir. In chips infected with pseudotyped severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), clinically relevant doses of the antimalarial drug amodiaquine inhibited infection but clinical doses of hydroxychloroquine and other antiviral drugs that inhibit the entry of pseudotyped SARS-CoV-2 in cell lines under static conditions did not. We also show that amodiaquine showed substantial prophylactic and therapeutic activities in hamsters challenged with native SARS-CoV-2. The human airway-on-a-chip may accelerate the identification of therapeutics and prophylactics with repurposing potential.
Trisomy 21 (T21) causes Down syndrome (DS), affecting immune and neurological function by ill-defined mechanisms. Here we report a large metabolomics study of plasma and cerebrospinal fluid, showing ...in independent cohorts that people with DS produce elevated levels of kynurenine and quinolinic acid, two tryptophan catabolites with potent immunosuppressive and neurotoxic properties, respectively. Immune cells of people with DS overexpress IDO1, the rate-limiting enzyme in the kynurenine pathway (KP) and a known interferon (IFN)-stimulated gene. Furthermore, the levels of IFN-inducible cytokines positively correlate with KP dysregulation. Using metabolic tracing assays, we show that overexpression of IFN receptors encoded on chromosome 21 contribute to enhanced IFN stimulation, thereby causing IDO1 overexpression and kynurenine overproduction in cells with T21. Finally, a mouse model of DS carrying triplication of IFN receptors exhibits KP dysregulation. Together, our results reveal a mechanism by which T21 could drive immunosuppression and neurotoxicity in DS.
Molecular alterations that confer phenotypic advantages to tumors can also expose specific therapeutic vulnerabilities. To search for potential treatments that would selectively affect metastatic ...cells, we examined the sensitivity of lineage-related human bladder cancer cell lines with different lung colonization abilities to chloroquine (CQ) or bafilomycin A₁, which are inhibitors of lysosome function and autophagy. Both CQ and bafilomycin A₁ were more cytotoxic in vitro to highly metastatic cells compared with their less metastatic counterparts. Genetic inactivation of macroautophagy regulators and lysosomal proteins indicated that this was due to greater reliance on the lysosome but not upon macroautophagy. To identify the mechanism underlying these effects, we generated cells resistant to CQ in vitro. Surprisingly, selection for in vitro CQ resistance was sufficient to alter gene expression patterns such that unsupervised cluster analysis of whole-transcriptome data indicated that selection for CQ resistance alone created tumor cells that were more similar to the poorly metastatic parental cells from which the metastatic cells were derived; importantly, these tumor cells also had diminished metastatic ability in vivo. These effects were mediated in part by differential expression of the transcriptional regulator ID4 (inhibitor of DNA binding 4); depletion of ID4 both promoted in vitro CQ sensitivity and restored lung colonization and metastasis of CQ-resistant cells. These data demonstrate that selection for metastasis ability confers selective vulnerability to lysosomal inhibitors and identify ID4 as a potential biomarker for the use of lysosomal inhibitors to reduce metastasis in patients.
During development of the central nervous system oligodendrocyte precursor cells (OPCs) give rise to both myelinating oligodendrocytes and NG2 glia, which are the most proliferative cells in the ...adult mammalian brain. NG2 glia retain characteristics of OPCs, and some NG2 glia produce oligodendrocytes, but many others persist throughout adulthood. Why some OPCs differentiate as oligodendrocytes during development whereas others persist as OPCs and acquire characteristics of NG2 glia is not known. Using zebrafish spinal cord as a model, we found that OPCs that differentiate rapidly as oligodendrocytes and others that remain as OPCs arise in sequential waves from distinct neural progenitors. Additionally, oligodendrocyte and persistent OPC fates are specified during a defined critical period by small differences in Shh signaling and Notch activity, which modulates Shh signaling response. Thus, our data indicate that OPCs fated to produce oligodendrocytes or remain as OPCs during development are specified as distinct cell types, raising the possibility that the myelinating potential of OPCs is set by graded Shh signaling activity.
•In larval zebrafish, spinal cord progenitors produce two distinct subpopulations of oligodendrocyte lineage cells (OPCs).•Progenitors that initiate olig2 expression at different times differentiate as oligodendrocytes or persist as OPCs.•Small differences in Shh and Notch signaling determine whether OPCs differentiate as oligodendrocytes or remain as OPCs.
Genome-wide transcriptome profiling identifies genes that are prone to differential expression (DE) across contexts, as well as genes with changes specific to the experimental manipulation. ...Distinguishing genes that are specifically changed in a context of interest from common differentially expressed genes (DEGs) allows more efficient prediction of which genes are specific to a given biological process under scrutiny. Currently, common DEGs or pathways can only be identified through the laborious manual curation of experiments, an inordinately time-consuming endeavor. Here we pioneer an approach, Specific cOntext Pattern Highlighting In Expression data (SOPHIE), for distinguishing between common and specific transcriptional patterns using a generative neural network to create a background set of experiments from which a null distribution of gene and pathway changes can be generated. We apply SOPHIE to diverse datasets including those from human, human cancer, and bacterial pathogen Pseudomonas aeruginosa. SOPHIE identifies common DEGs in concordance with previously described, manually and systematically determined common DEGs. Further molecular validation indicates that SOPHIE detects highly specific but low-magnitude biologically relevant transcriptional changes. SOPHIE’s measure of specificity can complement log2 fold change values generated from traditional DE analyses. For example, by filtering the set of DEGs, one can identify genes that are specifically relevant to the experimental condition of interest. Consequently, these results can inform future research directions. All scripts used in these analyses are available at https://github.com/greenelab/generic-expression-patterns. Users can access https://github.com/greenelab/sophie to run SOPHIE on their own data.