In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. ...Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
Display omitted
•Fungal pathogens pose a significant threat to global crop yields, causing substantial economic losses. Identifying fungal effectors is essential for understanding plant-pathogen ...interactions and improving food safety and agricultural productivity.•Leveraging global patterns learned from pretrained protein language models with refined information from known effectors, Fungtion significantly outperforms existing tools in predicting fungal effectors.•With further interactive visualizations, Fungtion empowers researchers to analyse both the sequence- and high-level relationships between predicted and known fungal effectors, aiding in future protein function assignment.•Fungtion stands as a valuable tool for the scientific community, enhancing prediction accuracy and facilitating downstream analyses of fungal effectors to formulate new hypotheses and drive biological knowledge discovery.
Fungal pathogens pose significant threats to plant health by secreting effectors that manipulate plant-host defences. However, identifying effector proteins remains challenging, in part because they lack common sequence motifs. Here, we introduce Fungtion (Fungal effector prediction), a toolkit leveraging a hybrid framework to accurately predict and visualize fungal effectors. By combining global patterns learned from pretrained protein language models with refined information from known effectors, Fungtion achieves state-of-the-art prediction performance. Additionally, the interactive visualizations we have developed enable researchers to explore both sequence- and high-level relationships between the predicted and known effectors, facilitating effector function discovery, annotation, and hypothesis formulation regarding plant-pathogen interactions. We anticipate Fungtion to be a valuable resource for biologists seeking deeper insights into fungal effector functions and for computational biologists aiming to develop future methodologies for fungal effector prediction: https://step3.erc.monash.edu/Fungtion/.
The Enzyme Function Initiative, an NIH/NIGMS-supported Large-Scale Collaborative Project (EFI; U54GM093342; http://enzymefunction.org/), is focused on devising and disseminating bioinformatics and ...computational tools as well as experimental strategies for the prediction and assignment of functions (in vitro activities and in vivo physiological/metabolic roles) to uncharacterized enzymes discovered in genome projects. Protein sequence similarity networks (SSNs) are visually powerful tools for analyzing sequence relationships in protein families (H.J. Atkinson, J.H. Morris, T.E. Ferrin, and P.C. Babbitt, PLoS One 2009, 4, e4345). However, the members of the biological/biomedical community have not had access to the capability to generate SSNs for their “favorite” protein families. In this article we announce the EFI-EST (Enzyme Function Initiative-Enzyme Similarity Tool) web tool (http://efi.igb.illinois.edu/efi-est/) that is available without cost for the automated generation of SSNs by the community. The tool can create SSNs for the “closest neighbors” of a user-supplied protein sequence from the UniProt database (Option A) or of members of any user-supplied Pfam and/or InterPro family (Option B). We provide an introduction to SSNs, a description of EFI-EST, and a demonstration of the use of EFI-EST to explore sequence–function space in the OMP decarboxylase superfamily (PF00215). This article is designed as a tutorial that will allow members of the community to use the EFI-EST web tool for exploring sequence/function space in protein families.
•Sequence–function space can be visualized using protein sequence similarity networks.•The EFI-EST webtool is available for generating sequence similarity networks.•A tutorial is provided that describes the use of EFI-EST.•The community is encouraged to use EFI-EST without cost.
Amyloids are typically associated with neurodegenerative diseases, but recent research demonstrates that several bacteria utilize functional amyloid fibrils to fortify the biofilm extracellular ...matrix and thereby resist antibiotic treatments. In Pseudomonas aeruginosa, these fibrils are composed predominantly of FapC, a protein with high-sequence conservation among the genera. Previous studies established FapC as the major amyloid subunit, but its mechanism of fibril formation in P. aeruginosa remained largely unexplored. Here, we examine the FapC sequence in greater detail through a combination of bioinformatics and protein engineering, and we identify specific motifs that are implicated in amyloid formation. Sequence regions of high evolutionary conservation tend to coincide with regions of high amyloid propensity, and mutation of amyloidogenic motifs to a designed, non-amyloidogenic motif suppresses fibril formation in a pH-dependent manner. We establish the particular significance of the third repeat motif in promoting fibril formation and also demonstrate emergence of soluble oligomer species early in the aggregation pathway. The insights reported here expand our understanding of the mechanism of amyloid polymerization in P. aeruginosa, laying the foundation for development of new amyloid inhibitors to combat recalcitrant biofilm infections.
Display omitted
•FapC protein from P. aeruginosa makes amyloid fibrils which strengthen biofilm.•We identify hexapeptide repeat sequences implicated in FapC amyloid formation.•Their removal slows aggregation and destabilizes the end-state fibrils.•They also increase sensitivity to EGCG and restrict the fibrillation pH range.•Removal of C-terminal Cys eliminates oligomers without affecting fibrillation.
•Tandem repeats (TRs) in protein sequences are frequent but difficult to detect.•We implemented a web server that uses a profile-based method to detect TRs.•Protein sequences can be analysed for 11 ...types of common TRs.•Proteins can be analysed separately or in multiple sequence alignments.•We provide precomputed results for 78 UniProt reference proteomes.
Ensembles of tandem repeats (TRs) in protein sequences expand rapidly to form domains well suited for interactions with proteins. For this reason, they are relatively frequent. Some TRs have known structures and therefore it is advantageous to predict their presence in a protein sequence. However, since most TRs diverge quickly, their detection by classical sequence comparison algorithms is not very accurate. Previously, we developed a method and a web server that used curated profiles and thresholds for the detection of 11 common TRs. Here we present a new web server (REP2) that allows the analysis of TRs in both individual and aligned sequences. We provide currently precomputed analyses for a selection of 78 UniProt reference proteomes. We illustrate how these data can be used to study the evolution of TRs using comparative genomics. REP2 can be accessed at http://cbdm-01.zdv.uni-mainz.de/~munoz/rep/.
20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo ...predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications. On top, EVfold-mfDCA depends on proprietary software.
Here, we present FreeContact, a fast, open source implementation of EVfold-mfDCA and PSICOV. On a test set of 140 proteins, FreeContact was almost eight times faster than PSICOV without decreasing prediction performance. The EVfold-mfDCA implementation of FreeContact was over 220 times faster than PSICOV with negligible performance decrease. EVfold-mfDCA was unavailable for testing due to its dependency on proprietary software. FreeContact is implemented as the free C++ library "libfreecontact", complete with command line tool "freecontact", as well as Perl and Python modules. All components are available as Debian packages. FreeContact supports the BioXSD format for interoperability.
FreeContact provides the opportunity to compute reliable contact predictions in any environment (desktop or cloud).
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Protein remote homology detection plays a vital role in studies of protein structures and functions. Almost all of the traditional machine leaning methods require fixed length features to represent ...the protein sequences. However, it is never an easy task to extract the discriminative features with limited knowledge of proteins. On the other hand, deep learning technique has demonstrated its advantage in automatically learning representations. It is worthwhile to explore the applications of deep learning techniques to the protein remote homology detection.
In this study, we employ the Bidirectional Long Short-Term Memory (BLSTM) to learn effective features from pseudo proteins, also propose a predictor called ProDec-BLSTM: it includes input layer, bidirectional LSTM, time distributed dense layer and output layer. This neural network can automatically extract the discriminative features by using bidirectional LSTM and the time distributed dense layer.
Experimental results on a widely-used benchmark dataset show that ProDec-BLSTM outperforms other related methods in terms of both the mean ROC and mean ROC50 scores. This promising result shows that ProDec-BLSTM is a useful tool for protein remote homology detection. Furthermore, the hidden patterns learnt by ProDec-BLSTM can be interpreted and visualized, and therefore, additional useful information can be obtained.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Mitotic catastrophe (MC) is a form of programmed cell death induced by mitotic process disorders, which is very important in tumor prevention, development, and drug resistance. Because rapidly ...increased data for MC is vigorously promoting the tumor-related biomedical and clinical study, it is urgent for us to develop a professional and comprehensive database to curate MC-related data. Mitotic Catastrophe Database (MCDB) consists of 1214 genes/proteins and 5014 compounds collected and organized from more than 8000 research articles. Also, MCDB defines the confidence level, classification criteria, and uniform naming rules for MC-related data, which greatly improves data reliability and retrieval convenience. Moreover, MCDB develops protein sequence alignment and target prediction functions. The former can be used to predict new potential MC-related genes and proteins, and the latter can facilitate the identification of potential target proteins of unknown MC-related compounds. In short, MCDB is such a proprietary, standard, and comprehensive database for MC-relate data that will facilitate the exploration of MC from chemists to biologists in the fields of medicinal chemistry, molecular biology, bioinformatics, oncology and so on. The MCDB is distributed on http://www.combio-lezhang.online/MCDB/index_html/.
MCDB is a proprietary, standard, and comprehensive database for mitotic catastrophe (MC)-relate data that will facilitate the exploration of MC from chemists to biologists in the fields of medicinal chemistry, molecular biology, bioinformatics, oncology and so on. Display omitted
•Classification of proteins by estimating their pseudo amino acid composition and spectral graph clustering method.•Each protein sequence is associated with its corresponding PseAAC.•Spectral graph ...clustering technique is used for classification.
The present work employs pseudo amino acid composition (PseAAC) for encoding the protein sequences in their numeric form. Later this will be arranged in the similarity matrix, which serves as input for spectral graph clustering method. Spectral methods are used previously also for clustering of protein sequences, but they uses pair wise alignment scores of protein sequences, in similarity matrix. The alignment score depends on the length of sequences, so clustering short and long sequences together may not good idea. Therefore the idea of introducing PseAAC with spectral clustering algorithm came into scene. We extensively tested our method and compared its performance with other existing machine learning methods. It is consistently observed that, the number of clusters that we obtained for a given set of proteins is close to the number of superfamilies in that set and PseAAC combined with spectral graph clustering shows the best classification results.