Antimicrobial resistance (AMR) poses a significant global health threat, and an accurate prediction of bacterial resistance patterns is critical for effective treatment and control strategies. In ...recent years, machine learning (ML) approaches have emerged as powerful tools for analyzing large-scale bacterial AMR data. However, ML methods often ignore evolutionary relationships among bacterial strains, which can greatly impact performance of the ML methods, especially if resistance-associated features are attempted to be detected. Genome-wide association studies (GWAS) methods like linear mixed models accounts for the evolutionary relationships in bacteria, but they uncover only highly significant variants which have already been reported in literature.
In this work, we introduce a novel phylogeny-related parallelism score (PRPS), which measures whether a certain feature is correlated with the population structure of a set of samples. We demonstrate that PRPS can be used, in combination with SVM- and random forest-based models, to reduce the number of features in the analysis, while simultaneously increasing models' performance. We applied our pipeline to publicly available AMR data from PATRIC database for Mycobacterium tuberculosis against six common antibiotics.
Using our pipeline, we re-discovered known resistance-associated mutations as well as new candidate mutations which can be related to resistance and not previously reported in the literature. We demonstrated that taking into account phylogenetic relationships not only improves the model performance, but also yields more biologically relevant predicted most contributing resistance markers.
Fitness conferred by the same allele may differ between genotypes and environments, and these differences shape variation and evolution. Changes in amino acid propensities at protein sites over the ...course of evolution have been inferred from sequence alignments statistically, but the existing methods are data-intensive and aggregate multiple sites. Here, we develop an approach to detect individual amino acids that confer different fitness in different groups of species from combined sequence and phylogenetic data. Using the fact that the probability of a substitution to an amino acid depends on its fitness, our method looks for amino acids such that substitutions to them occur more frequently in one group of lineages than in another. We validate our method using simulated evolution of a protein site under different scenarios and show that it has high specificity for a wide range of assumptions regarding the underlying changes in selection, while its sensitivity differs between scenarios. We apply our method to the env gene of two HIV-1 subtypes, A and B, and to the HA gene of two influenza A subtypes, H1 and H3, and show that the inferred fitness changes are consistent with the fitness differences observed in deep mutational scanning experiments. We find that changes in relative fitness of different amino acid variants within a site do not always trigger episodes of positive selection and therefore may not result in an overall increase in the frequency of substitutions, but can still be detected from changes in relative frequencies of different substitutions.
Abstract
Motivation
Bloom filters are a popular data structure that allows rapid searches in large sequence datasets. So far, all tools work with nucleotide sequences; however, protein sequences are ...conserved over longer evolutionary distances, and only mutations on the protein level may have any functional significance.
Results
We present MetaProFi, a Bloom filter-based tool that, for the first time, offers the functionality to build indexes of amino acid sequences and query them with both amino acid and nucleotide sequences, thus bringing sequence comparison to the biologically relevant protein level. MetaProFi implements additional efficient engineering solutions, such as a shared memory system, chunked data storage and efficient compression. In addition to its conceptual novelty, MetaProFi demonstrates state-of-the-art performance and excellent memory consumption-to-speed ratio when applied to various large datasets.
Availability and implementation
Source code in Python is available at https://github.com/kalininalab/metaprofi.
Abstract
Alternative splicing plays a major role in regulating the functional repertoire of the proteome. However, isoform-specific effects to protein-protein interactions (PPIs) are usually ...overlooked, making it impossible to judge the functional role of individual exons on a systems biology level. We overcome this barrier by integrating protein-protein interactions, domain-domain interactions and residue-level interactions information to lift exon expression analysis to a network level. Our user-friendly database DIGGER is available at https://exbio.wzw.tum.de/digger and allows users to seamlessly switch between isoform and exon-centric views of the interactome and to extract sub-networks of relevant isoforms, making it an essential resource for studying mechanistic consequences of alternative splicing.
Abstract
Motivation
In proteins, solvent accessibility of individual residues is a factor contributing to their importance for protein function and stability. Hence one might wish to calculate ...solvent accessibility in order to predict the impact of mutations, their pathogenicity and for other biomedical applications. A direct computation of solvent accessibility is only possible if all atoms of a protein three-dimensional structure are reliably resolved.
Results
We present SphereCon, a new precise measure that can estimate residue relative solvent accessibility (RSA) from limited data. The measure is based on calculating the volume of intersection of a sphere with a cone cut out in the direction opposite of the residue with surrounding atoms. We propose a method for estimating the position and volume of residue atoms in cases when they are not known from the structure, or when the structural data are unreliable or missing. We show that in cases of reliable input structures, SphereCon correlates almost perfectly with the directly computed RSA, and outperforms other previously suggested indirect methods. Moreover, SphereCon is the only measure that yields accurate results when the identities of amino acids are unknown. A significant novel feature of SphereCon is that it can estimate RSA from inter-residue distance and contact matrices, without any information about the actual atom coordinates.
Availability and implementation
https://github.com/kalininalab/spherecon.
Contact
alexander.gress@helmholtz-hips.de
Supplementary information
Supplementary data are available at Bioinformatics online.
Bottromycins are ribosomally synthesized and post-translationally modified peptide natural product antibiotics that are effective against high-priority human pathogens such as methicillin-resistant ...Staphylococcus aureus. The total synthesis of bottromycins involves at least 17 steps, with a poor overall yield. Here, we report the characterization of the cytochrome P450 enzyme BotCYP from a bottromycin biosynthetic gene cluster. We determined the structure of a close BotCYP homolog and used our data to conduct the first large-scale survey of P450 enzymes associated with RiPP biosynthetic gene clusters. We demonstrate that BotCYP converts a C-terminal thiazoline to a thiazole via an oxidative decarboxylation reaction and provides stereochemical resolution for the pathway. Our data enable the two-pot in vitro production of the bottromycin core scaffold and may allow the rapid generation of bottromycin analogues for compound development.
Tandem alternative splice sites (TASS) is a special class of alternative splicing events that are characterized by a close tandem arrangement of splice sites. Most TASS lack functional ...characterization and are believed to arise from splicing noise. Based on the RNA-seq data from the Genotype Tissue Expression project, we present an extended catalogue of TASS in healthy human tissues and analyze their tissue-specific expression. The expression of TASS is usually dominated by one major splice site (maSS), while the expression of minor splice sites (miSS) is at least an order of magnitude lower. Among 46k miSS with sufficient read support, 9k (20%) are significantly expressed above the expected noise level, and among them 2.5k are expressed tissue-specifically. We found significant correlations between tissue-specific expression of RNA-binding proteins (RBP), tissue-specific expression of miSS, and miSS response to RBP inactivation by shRNA. In combination with RBP profiling by eCLIP, this allowed prediction of novel cases of tissue-specific splicing regulation including a miSS in QKI mRNA that is likely regulated by PTBP1. The analysis of human primary cell transcriptomes suggested that both tissue-specific and cell-type-specific factors contribute to the regulation of miSS expression. More than 20% of tissue-specific miSS affect structured protein regions and may adjust protein-protein interactions or modify the stability of the protein core. The significantly expressed miSS evolve under the same selection pressure as maSS, while other miSS lack signatures of evolutionary selection and conservation. Using mixture models, we estimated that not more than 15% of maSS and not more than 54% of tissue-specific miSS are noisy, while the proportion of noisy splice sites among non-significantly expressed miSS is above 63%.
We report the synthesis and characterization of amorphous iron oxide nanoparticles from iron salts in aqueous extracts of monocotyledonous (Hordeum vulgare) and dicotyledonous (Rumex acetosa) plants. ...The nanoparticles were characterized by TEM, absorbance spectroscopy, SAED, EELS, XPS, and DLS methods and were shown to contain mainly iron oxide and iron oxohydroxide. H. vulgare extracts produced amorphous iron oxide nanoparticles with diameters of up to 30 nm. These iron nanoparticles are intrinsically unstable and prone to aggregation; however, we rendered them stable in the long term by addition of 40 mM citrate buffer pH 3.0. In contrast, amorphous iron oxide nanoparticles (diameters of 10–40 nm) produced using R. acetosa extracts are highly stable. The total protein content and antioxidant capacity are similar for both extracts, but pH values differ (H. vulgare pH 5.8 vs R. acetosa pH 3.7). We suggest that the presence of organic acids (such oxalic or citric acids) plays an important role in the stabilization of iron nanoparticles, and that plants containing such constituents may be more efficacious for the green synthesis of iron nanoparticles.
Glycans are important polysaccharides on cellular surfaces that are bound to glycoproteins and glycolipids. These are one of the most common post-translational modifications of proteins in eukaryotic ...cells. They play important roles in protein folding, cell-cell interactions, and other extracellular processes. Changes in glycan structures may influence the course of different diseases, such as infections or cancer. Glycans are commonly represented using the IUPAC-condensed notation. IUPAC-condensed is a textual representation of glycans operating on the same topological level as the Symbol Nomenclature for Glycans (SNFG) that assigns colored, geometrical shapes to the main monomers. These symbols are then connected in tree-like structures, visualizing the glycan structure on a topological level. Yet for a representation on the atomic level, notations such as SMILES should be used. To our knowledge, there is no easy-to-use, general, open-source, and offline tool to convert the IUPAC-condensed notation to SMILES. Here, we present the open-access Python package GlyLES for the generalizable generation of SMILES representations out of IUPAC-condensed representations. GlyLES uses a grammar to read in the monomer tree from the IUPAC-condensed notation. From this tree, the tool can compute the atomic structures of each monomer based on their IUPAC-condensed descriptions. In the last step, it merges all monomers into the atomic structure of a glycan in the SMILES notation. GlyLES is the first package that allows conversion from the IUPAC-condensed notation of glycans to SMILES strings. This may have multiple applications, including straightforward visualization, substructure search, molecular modeling and docking, and a new featurization strategy for machine-learning algorithms. GlyLES is available at
https://github.com/kalininalab/GlyLES
.
Recent studies identified signal peptidase complex subunit 1 (SPCS1) as a proviral host factor for Flaviviridae viruses, including HCV. One of the SPCS1's roles in flavivirus propagation was ...attributed to its regulation of signal peptidase complex (SPC)-mediated processing of flavivirus polyprotein, especially C-prM junction. However, whether SPCS1 also regulates any SPC-mediated processing sites within HCV polyprotein remains unclear. In this study, we determined that loss of SPCS1 specifically impairs the HCV E2-p7 processing by the SPC. We also determined that efficient separation of E2 and p7, regardless of its dependence on SPC-mediated processing, leads to SPCS1 dispensable for HCV assembly These results suggest that SPCS1 regulates HCV assembly by facilitating the SPC-mediated processing of E2-p7 precursor. Structural modeling suggests that intrinsically delayed processing of the E2-p7 is likely caused by the structural rigidity of p7 N-terminal transmembrane helix-1 (p7/TM1/helix-1), which has mostly maintained membrane-embedded conformations during molecular dynamics (MD) simulations. E2-p7-processing-impairing p7 mutations narrowed the p7/TM1/helix-1 bending angle against the membrane, resulting in closer membrane embedment of the p7/TM1/helix-1 and less access of E2-p7 junction substrate to the catalytic site of the SPC, located well above the membrane in the ER lumen. Based on these results we propose that the key mechanism of action of SPCS1 in HCV assembly is to facilitate the E2-p7 processing by enhancing the E2-p7 junction site presentation to the SPC active site. By providing evidence that SPCS1 facilitates HCV assembly by regulating SPC-mediated cleavage of E2-p7 junction, equivalent to the previously established role of this protein in C-prM junction processing in flavivirus, this study establishes the common role of SPCS1 in Flaviviridae family virus propagation as to exquisitely regulate the SPC-mediated processing of specific, suboptimal target sites.