Metagenomics is unearthing the previously hidden world of soil viruses. Many soil viral sequences in metagenomes contain putative auxiliary metabolic genes (AMGs) that are not associated with viral ...replication. Here, we establish that AMGs on soil viruses actually produce functional, active proteins. We focus on AMGs that potentially encode chitosanase enzymes that metabolize chitin - a common carbon polymer. We express and functionally screen several chitosanase genes identified from environmental metagenomes. One expressed protein showing endo-chitosanase activity (V-Csn) is crystalized and structurally characterized at ultra-high resolution, thus representing the structure of a soil viral AMG product. This structure provides details about the active site, and together with structure models determined using AlphaFold, facilitates understanding of substrate specificity and enzyme mechanism. Our findings support the hypothesis that soil viruses contribute auxiliary functions to their hosts.
The relationships of kinase levels and activity have been investigated using large, high quality proteomic and phosphoproteomic data sets from tumors. Results show that the protein levels of some ...kinases correlate with their activity and that activation of kinases is a complex process. This study provides the first analysis of kinase activity in cancer integrating proteomic and phosphoproteomic data.
Display omitted
Highlights
•Integration of proteomics and phosphoproteomics data to understand kinase activity.•The abundance of some kinases correlates with activity.•Kinase activity does not necessarily reflect phosphorylation of regulatory sites.•Correlation patterns can be used to extend kinase substrate repertoire.
Phosphorylation of proteins is a key way cells regulate function, both at the individual protein level and at the level of signaling pathways. Kinases are responsible for phosphorylation of substrates, generally on serine, threonine, or tyrosine residues. Though particular sequence patterns can be identified that dictate whether a residue will be phosphorylated by a specific kinase, these patterns are not highly predictive of phosphorylation. The availability of large scale proteomic and phosphoproteomic data sets generated using mass-spectrometry-based approaches provides an opportunity to study the important relationship between kinase activity, substrate specificity, and phosphorylation. In this study, we analyze relationships between protein abundance and phosphopeptide abundance across more than 150 tumor samples and show that phosphorylation at specific phosphosites is not well correlated with overall kinase abundance. However, individual kinases show a clear and statistically significant difference in correlation among known phosphosite targets for that kinase and randomly selected phosphosites. We further investigate relationships between phosphorylation of known activating or inhibitory sites on kinases and phosphorylation of their target phosphosites. Combined with motif-based analysis, this approach can predict novel kinase targets and show which subsets of a kinase's target repertoire are specifically active in one condition versus another.
Abstract
Bacteriophages are abundant in soils. However, the majority are uncharacterized, and their hosts are unknown. Here, we apply high-throughput chromosome conformation capture (Hi–C) to ...directly capture phage-host relationships. Some hosts have high centralities in bacterial community co-occurrence networks, suggesting phage infections have an important impact on the soil bacterial community interactions. We observe increased average viral copies per host (VPH) and decreased viral transcriptional activity following a two-week soil-drying incubation, indicating an increase in lysogenic infections. Soil drying also alters the observed phage host range. A significant negative correlation between VPH and host abundance prior to drying indicates more lytic infections result in more host death and inversely influence host abundance. This study provides empirical evidence of phage-mediated bacterial population dynamics in soil by directly capturing specific phage-host interactions.
High-throughput multi-omics studies and corresponding network analyses of multi-omic data have rapidly expanded their impact over the last 10 years. As biological features of different types (e.g. ...transcripts, proteins, metabolites) interact within cellular systems, the greatest amount of knowledge can be gained from networks that incorporate multiple types of -omic data. However, biological and technical sources of variation diminish the ability to detect cross-type associations, yielding networks dominated by communities comprised of nodes of the same type. We describe here network building methods that can maximize edges between nodes of different data types leading to integrated networks, networks that have a large number of edges that link nodes of different-omic types (transcripts, proteins, lipids etc). We systematically rank several network inference methods and demonstrate that, in many cases, using a random forest method, GENIE3, produces the most integrated networks. This increase in integration does not come at the cost of accuracy as GENIE3 produces networks of approximately the same quality as the other network inference methods tested here. Using GENIE3, we also infer networks representing antibody-mediated Dengue virus cell invasion and receptor-mediated Dengue virus invasion. A number of functional pathways showed centrality differences between the two networks including genes responding to both GM-CSF and IL-4, which had a higher centrality value in an antibody-mediated vs. receptor-mediated Dengue network. Because a biological system involves the interplay of many different types of molecules, incorporating multiple data types into networks will improve their use as models of biological systems. The methods explored here are some of the first to specifically highlight and address the challenges associated with how such multi-omic networks can be assembled and how the greatest number of interactions can be inferred from different data types. The resulting networks can lead to the discovery of new host response patterns and interactions during viral infection, generate new hypotheses of pathogenic mechanisms and confirm mechanisms of disease.
In this review, we apply selected imputation strategies to label-free liquid chromatography–mass spectrometry (LC–MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance ...and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC–MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yielded the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. On the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives.
Our study details the stepwise evolution of gilteritinib resistance in FLT3-mutated acute myeloid leukemia (AML). Early resistance is mediated by the bone marrow microenvironment, which protects ...residual leukemia cells. Over time, leukemia cells evolve intrinsic mechanisms of resistance, or late resistance. We mechanistically define both early and late resistance by integrating whole-exome sequencing, CRISPR-Cas9, metabolomics, proteomics, and pharmacologic approaches. Early resistant cells undergo metabolic reprogramming, grow more slowly, and are dependent upon Aurora kinase B (AURKB). Late resistant cells are characterized by expansion of pre-existing NRAS mutant subclones and continued metabolic reprogramming. Our model closely mirrors the timing and mutations of AML patients treated with gilteritinib. Pharmacological inhibition of AURKB resensitizes both early resistant cell cultures and primary leukemia cells from gilteritinib-treated AML patients. These findings support a combinatorial strategy to target early resistant AML cells with AURKB inhibitors and gilteritinib before the expansion of pre-existing resistance mutations occurs.
Display omitted
•Stepwise model of early to late gilteritinib resistance recapitulates human disease•Early resistant cells in marrow microenvironment rely on AURKB to resume growth•Pre-existing NRAS mutations expand in late resistance and drive relapse•Metabolic reprogramming occurs during evolution of gilteritinib resistance
Gilteritinib is an effective FLT3 inhibitor for AML, but residual cells survive in the marrow microenvironment. Over time, these early resistant cells evolve intrinsic mechanisms of resistance leading to relapse. Joshi et al. use a comprehensive approach to interrogate the evolution of resistance; identifying AURKB as critical for early resistance.
The reconstruction of complete microbial metabolic pathways using 'omics data from environmental samples remains challenging. Computational pipelines for pathway reconstruction that utilize machine ...learning methods to predict the presence or absence of KEGG modules in incomplete genomes are lacking. Here, we present MetaPathPredict, a software tool that incorporates machine learning models to predict the presence of complete KEGG modules within bacterial genomic datasets. Using gene annotation data and information from the KEGG module database, MetaPathPredict employs deep learning models to predict the presence of KEGG modules in a genome. MetaPathPredict can be used as a command line tool or as a Python module, and both options are designed to be run locally or on a compute cluster. Benchmarks show that MetaPathPredict makes robust predictions of KEGG module presence within highly incomplete genomes.
Breast cancer (BC) is the most commonly diagnosed cancer and the leading cause of cancer death among women globally. Despite advances, there is considerable variation in clinical outcomes for ...patients with non-luminal A tumors, classified as difficult-to-treat breast cancers (DTBC). This study aims to delineate the proteogenomic landscape of DTBC tumors compared to luminal A (LumA) tumors.
We retrospectively collected a total of 117 untreated primary breast tumor specimens, focusing on DTBC subtypes. Breast tumors were processed by laser microdissection (LMD) to enrich tumor cells. DNA, RNA, and protein were simultaneously extracted from each tumor preparation, followed by whole genome sequencing, paired-end RNA sequencing, global proteomics and phosphoproteomics. Differential feature analysis, pathway analysis and survival analysis were performed to better understand DTBC and investigate biomarkers.
We observed distinct variations in gene mutations, structural variations, and chromosomal alterations between DTBC and LumA breast tumors. DTBC tumors predominantly had more mutations in TP53, PLXNB3, Zinc finger genes, and fewer mutations in SDC2, CDH1, PIK3CA, SVIL, and PTEN. Notably, Cytoband 1q21, which contains numerous cell proliferation-related genes, was significantly amplified in the DTBC tumors. LMD successfully minimized stromal components and increased RNA-protein concordance, as evidenced by stromal score comparisons and proteomic analysis. Distinct DTBC and LumA-enriched clusters were observed by proteomic and phosphoproteomic clustering analysis, some with survival differences. Phosphoproteomics identified two distinct phosphoproteomic profiles for high relapse-risk and low relapse-risk basal-like tumors, involving several genes known to be associated with breast cancer oncogenesis and progression, including KIAA1522, DCK, FOXO3, MYO9B, ARID1A, EPRS, ZC3HAV1, and RBM14. Lastly, an integrated pathway analysis of multi-omics data highlighted a robust enrichment of proliferation pathways in DTBC tumors.
This study provides an integrated proteogenomic characterization of DTBC vs LumA with tumor cells enriched through laser microdissection. We identified many common features of DTBC tumors and the phosphopeptides that could serve as potential biomarkers for high/low relapse-risk basal-like BC and possibly guide treatment selections.
Various genetic mutations associated with cancer are known to alter cell signaling, but it is not clear whether they dysregulate signaling pathways by altering the abundance of pathway proteins. ...Using a combination of RNA sequencing and ultrasensitive targeted proteomics, we defined the primary components-16 core proteins and 10 feedback regulators-of the epidermal growth factor receptor (EGFR)-mitogen-activated protein kinase (MAPK) pathway in normal human mammary epithelial cells and then quantified their absolute abundance across a panel of normal and breast cancer cell lines as well as fibroblasts. We found that core pathway proteins were present at very similar concentrations across all cell types, with a variance similar to that of proteins previously shown to display conserved abundances across species. In contrast, EGFR and transcriptionally controlled feedback regulators were present at highly variable concentrations. The absolute abundance of most core proteins was between 50,000 and 70,000 copies per cell, but the adaptors SOS1, SOS2, and GAB1 were found at far lower amounts (2000 to 5000 copies per cell). MAPK signaling showed saturation in all cells between 3000 and 10,000 occupied EGFRs, consistent with the idea that adaptors limit signaling. Our results suggest that the relative stoichiometry of core MAPK pathway proteins is very similar across different cell types, with cell-specific differences mostly restricted to variable amounts of feedback regulators and receptors. The low abundance of adaptors relative to EGFR could be responsible for previous observations that only a fraction of total cell surface EGFR is capable of rapid endocytosis, high-affinity binding, and mitogenic signaling.
Soil viruses are abundant, but the influence of the environment and climate on soil viruses remains poorly understood. Here, we addressed this gap by comparing the diversity, abundance, lifestyle, ...and metabolic potential of DNA viruses in three grassland soils with historical differences in average annual precipitation, low in eastern Washington (WA), high in Iowa (IA), and intermediate in Kansas (KS). Bioinformatics analyses were applied to identify a total of 2,631 viral contigs, including 14 complete viral genomes from three deep metagenomes (1 terabase Tb each) that were sequenced from bulk soil DNA. An additional three replicate metagenomes (∼0.5 Tb each) were obtained from each location for statistical comparisons. Identified viruses were primarily bacteriophages targeting dominant bacterial taxa. Both viral and host diversity were higher in soil with lower precipitation. Viral abundance was also significantly higher in the arid WA location than in IA and KS. More lysogenic markers and fewer clustered regularly interspaced short palindromic repeats (CRISPR) spacer hits were found in WA, reflecting more lysogeny in historically drier soil. More putative auxiliary metabolic genes (AMGs) were also detected in WA than in the historically wetter locations. The AMGs occurring in 18 pathways could potentially contribute to carbon metabolism and energy acquisition in their hosts. Structural equation modeling (SEM) suggested that historical precipitation influenced viral life cycle and selection of AMGs. The observed and predicted relationships between soil viruses and various biotic and abiotic variables have value for predicting viral responses to environmental change.
Soil viruses are abundant but poorly understood. Because soil viruses regulate the dynamics of their hosts and potentially key processes in soil ecology, it is important to understand them better. Here, we leveraged massive DNA sequencing to unearth previously unknown soil viruses. We found that soil viruses differed across a historical gradient of precipitation. We compared soil viruses from Iowa, which is traditionally wetter, to those from Washington, which is traditionally drier, and from Kansas, which is intermediate. This study provides strong evidence that changes in historical precipitation impact not only the types of soil viruses but also their functional potential.