Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we ...discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.
There is a need for better classification and understanding of tumor-infiltrating lymphocytes (TILs). Here, we applied advanced functional genomics to interrogate 9,000 human tumors and multiple ...single-cell sequencing sets using benchmarked T cell states, comprehensive T cell differentiation trajectories, human and mouse vaccine responses, and other human TILs. Compared with other T cell states, enrichment of T memory/resident memory programs was observed across solid tumors. Trajectory analysis of single-cell melanoma CD8+ TILs also identified a high fraction of memory/resident memory-scoring TILs in anti-PD-1 responders, which expanded post therapy. In contrast, TILs scoring highly for early T cell activation, but not exhaustion, associated with non-response. Late/persistent, but not early activation signatures, prognosticate melanoma survival, and co-express with dendritic cell and IFN-γ response programs. These data identify an activation-like state associated to poor response and suggest successful memory conversion, above resuscitation of exhaustion, is an under-appreciated aspect of successful anti-tumoral immunity.
Display omitted
•Improved global TIL classification methods are required to deconvolve cell states•αPD-1 non-responder TILs and dysfunctional TILs score for T activation, not exhaustion•αPD-1 response and patient survival associates with late T cell memory/TRM scoring•Persistent programs co-express with DC maturation and IFN-γ response programs
Jaiswal et al. highlight the need for improved tumor-infiltrating lymphocyte (TIL) classification by showing that current transcriptome assignments may misclassify early activated/effector TILs as exhausted. The study surveys 9,000 solid tumors, multiple single-cell RNA sequencing sets, mouse and human models, and scoring methods to reclassify TILs and associate melanoma survival to T cell memory/resident memory.
Protein-protein, cell signaling, metabolic, and transcriptional interaction networks are useful for identifying connections between lists of experimentally identified genes/proteins. However, besides ...physical or co-expression interactions there are many ways in which pairs of genes, or their protein products, can be associated. By systematically incorporating knowledge on shared properties of genes from diverse sources to build functional association networks (FANs), researchers may be able to identify additional functional interactions between groups of genes that are not readily apparent.
Genes2FANs is a web based tool and a database that utilizes 14 carefully constructed FANs and a large-scale protein-protein interaction (PPI) network to build subnetworks that connect lists of human and mouse genes. The FANs are created from mammalian gene set libraries where mouse genes are converted to their human orthologs. The tool takes as input a list of human or mouse Entrez gene symbols to produce a subnetwork and a ranked list of intermediate genes that are used to connect the query input list. In addition, users can enter any PubMed search term and then the system automatically converts the returned results to gene lists using GeneRIF. This gene list is then used as input to generate a subnetwork from the user's PubMed query. As a case study, we applied Genes2FANs to connect disease genes from 90 well-studied disorders. We find an inverse correlation between the counts of links connecting disease genes through PPI and links connecting diseases genes through FANs, separating diseases into two categories.
Genes2FANs is a useful tool for interpreting the relationships between gene/protein lists in the context of their various functions and networks. Combining functional association interactions with physical PPIs can be useful for revealing new biology and help form hypotheses for further experimentation. Our finding that disease genes in many cancers are mostly connected through PPIs whereas other complex diseases, such as autism and type-2 diabetes, are mostly connected through FANs without PPIs, can guide better strategies for disease gene discovery. Genes2FANs is available at: http://actin.pharm.mssm.edu/genes2FANs.
In the microenvironment of a malignancy, tumor cells do not exist in isolation, but rather in a diverse ecosystem consisting not only of heterogeneous tumor-cell clones, but also normal cell types ...such as fibroblasts, vasculature, and an extensive pool of immune cells at numerous possible stages of activation and differentiation. This results in a complex interplay of diverse cellular signaling systems, where the immune cell component is now established to influence cancer progression and therapeutic response. It is experimentally difficult and laborious to comprehensively and systematically profile these distinct cell types from heterogeneous tumor samples in order to capitalize on potential therapeutic and biomarker discoveries. One emerging solution to address this challenge is to computationally extract cell-type specific information directly from bulk tumors. Such in silico approaches are advantageous because they can capture both the cell-type specific profiles and the tissue systems level of cell-cell interactions. Accurately and comprehensively predicting these patterns in tumors is an important challenge to overcome, not least given the success of immunotherapeutic drug treatment of several human cancers. This is especially challenging for subsets of closely related immune cell phenotypes with relatively small gene expression differences, which have critical functional distinctions. Here, we outline the existing and emerging novel bioinformatics strategies that can be used to profile the tumor immune landscape.
The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased ...dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences.
JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway.
JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda.
A high concentration of circulating vascular endothelial growth factor (VEGF) in cancer patients is associated with an aggressive tumor phenotype. Here, serum levels of 27 cytokines and blood cell ...counts were assessed in breast cancer patients receiving neoadjuvant chemotherapy with or without bevacizumab (Bev) in a randomized cohort of 132 patients with non-metastatic HER2-negative tumors. Cytokine levels were determined prior to treatment and at various time-points. The cytotoxic chemotherapy regimen of fluorouracil, epirubicin, and cyclophosphamide (FEC) had a profound impact on both circulating white blood cells and circulating cytokine levels. At the end of FEC treatment, the global decrease in cytokine levels correlated with the drop in white blood cell counts and was significantly greater in the patients of the Bev arm for cytokines, such as VEGF-A, IL-12, IP-10 and IL-10. Among patients who received Bev, those with pathological complete response (pCR) exhibited significantly lower levels of VEGF-A, IFN-γ, TNF-α and IL-4 than patients without pCR. This effect was not observed in the chemotherapy-only arm. Certain circulating cytokine profiles were found to correlate with different immune cell types at the tumor site. For the Bev arm patients, the serum cytokine levels correlated with higher levels of cytotoxic T cells at the end of the therapy regimen, which was indicative of treatment response. The higher response rate for Bev-treated patients and stronger correlations between serum cytokine levels and infiltrating CD8T cells merits further investigation.
Protein-protein interactions play an essential role in nearly all biological processes, and it has become increasingly clear that in order to better understand the fundamental processes that underlie ...disease, we must develop a strong understanding of both their context specificity (e.g., tissue-specificity) as well as their dynamic nature (e.g., how they respond to environmental changes). While network-based approaches have found much initial success in the application of protein-protein interactions (PPIs) towards systems-level explorations of biology, they often overlook the fact that large numbers of proteins undergo alternative splicing. Alternative splicing has not only been shown to diversify protein function through the generation of multiple protein isoforms, but also remodel PPIs and affect a wide range diseases, including cancer. Isoform-specific interactions are not well characterized, so we develop a computational approach that uses domain-domain interactions in concert with differential exon usage data from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression project (GTEx). Using this approach, we can characterize PPIs likely disrupted or possibly even increased due to splicing events for individual TCGA cancer patient samples relative to a matched GTEx normal tissue background.
Abstract
Motivation
Model organisms are widely used to better understand the molecular causes of human disease. While sequence similarity greatly aids this cross-species transfer, sequence similarity ...does not imply functional similarity, and thus, several current approaches incorporate protein–protein interactions to help map findings between species. Existing transfer methods either formulate the alignment problem as a matching problem which pits network features against known orthology, or more recently, as a joint embedding problem.
Results
We propose a novel state-of-the-art joint embedding solution: Embeddings to Network Alignment (ETNA). ETNA generates individual network embeddings based on network topological structure and then uses a Natural Language Processing-inspired cross-training approach to align the two embeddings using sequence-based orthologs. The final embedding preserves both within and between species gene functional relationships, and we demonstrate that it captures both pairwise and group functional relevance. In addition, ETNA’s embeddings can be used to transfer genetic interactions across species and identify phenotypic alignments, laying the groundwork for potential opportunities for drug repurposing and translational studies.
Availability and implementation
https://github.com/ylaboratory/ETNA
A major obstacle to treating Alzheimer’s disease (AD) is our lack of understanding of the molecular mechanisms underlying selective neuronal vulnerability, a key characteristic of the disease. Here, ...we present a framework integrating high-quality neuron-type-specific molecular profiles across the lifetime of the healthy mouse, which we generated using bacTRAP, with postmortem human functional genomics and quantitative genetics data. We demonstrate human-mouse conservation of cellular taxonomy at the molecular level for neurons vulnerable and resistant in AD, identify specific genes and pathways associated with AD neuropathology, and pinpoint a specific functional gene module underlying selective vulnerability, enriched in processes associated with axonal remodeling, and affected by amyloid accumulation and aging. We have made all cell-type-specific profiles and functional networks available at http://alz.princeton.edu. Overall, our study provides a molecular framework for understanding the complex interplay between Aβ, aging, and neurodegeneration within the most vulnerable neurons in AD.
•Ribosomal profiling of AD vulnerable/resistant neurons in 5-, 12-, 24-month old mice•Using human neuron-type functional networks and GWASs to model vulnerability•Identification of axon plasticity genes linking Aß, aging, tau in vulnerable neurons•PTB, regulator of tau exon 10 splicing, might contribute to selective vulnerability
Neurons display different levels of vulnerability to Alzheimer’s pathology. Roussarie et al. experimentally profile and computationally model several relevant neuron types. Using a mouse-human framework, they identify genes linking Aß, aging, and tau in vulnerable neurons. Finally, they show experimentally that PTB, a regulator of tau splicing, contributes to vulnerability.
Coregulator proteins (CoRegs) are part of multi-protein complexes that transiently assemble with transcription factors and chromatin modifiers to regulate gene expression. In this study we analyzed ...data from 3,290 immuno-precipitations (IP) followed by mass spectrometry (MS) applied to human cell lines aimed at identifying CoRegs complexes. Using the semi-quantitative spectral counts, we scored binary protein-protein and domain-domain associations with several equations. Unlike previous applications, our methods scored prey-prey protein-protein interactions regardless of the baits used. We also predicted domain-domain interactions underlying predicted protein-protein interactions. The quality of predicted protein-protein and domain-domain interactions was evaluated using known binary interactions from the literature, whereas one protein-protein interaction, between STRN and CTTNBP2NL, was validated experimentally; and one domain-domain interaction, between the HEAT domain of PPP2R1A and the Pkinase domain of STK25, was validated using molecular docking simulations. The scoring schemes presented here recovered known, and predicted many new, complexes, protein-protein, and domain-domain interactions. The networks that resulted from the predictions are provided as a web-based interactive application at http://maayanlab.net/HT-IP-MS-2-PPI-DDI/.