Post translational modifications (PTMs) occur in the vast majority of proteins and are essential for function. Prediction of the sequence location of PTMs enhances the functional characterisation of ...proteins. Glycosylation is one type of PTM, and is implicated in protein folding, transport and function.
We use the random forest algorithm and pairwise patterns to predict glycosylation sites. We identify pairwise patterns surrounding glycosylation sites and use an odds ratio to weight their propensity of association with modified residues. Our prediction program, GPP (glycosylation prediction program), predicts glycosylation sites with an accuracy of 90.8% for Ser sites, 92.0% for Thr sites and 92.8% for Asn sites. This is significantly better than current glycosylation predictors. We use the trepan algorithm to extract a set of comprehensible rules from GPP, which provide biological insight into all three major glycosylation types.
We have created an accurate predictor of glycosylation sites and used this to extract comprehensible rules about the glycosylation process. GPP is available online at http://comp.chem.nottingham.ac.uk/glyco/.
Memorial Viewpoint for Nicholas A. Besley Hirst, Jonathan D
The journal of physical chemistry. A, Molecules, spectroscopy, kinetics, environment, & general theory,
09/2021, Letnik:
125, Številka:
38
Journal Article
Vibrational spectroscopy is an essential tool in chemical analyses, biological assays, and studies of functional materials. Over the past decade, various coherent nonlinear vibrational spectroscopic ...techniques have been developed and enabled researchers to study time-correlations of the fluctuating frequencies that are directly related to solute–solvent dynamics, dynamical changes in molecular conformations and local electrostatic environments, chemical and biochemical reactions, protein structural dynamics and functions, characteristic processes of functional materials, and so on. In order to gain incisive and quantitative information on the local electrostatic environment, molecular conformation, protein structure and interprotein contacts, ligand binding kinetics, and electric and optical properties of functional materials, a variety of vibrational probes have been developed and site-specifically incorporated into molecular, biological, and material systems for time-resolved vibrational spectroscopic investigation. However, still, an all-encompassing theory that describes the vibrational solvatochromism, electrochromism, and dynamic fluctuation of vibrational frequencies has not been completely established mainly due to the intrinsic complexity of intermolecular interactions in condensed phases. In particular, the amount of data obtained from the linear and nonlinear vibrational spectroscopic experiments has been rapidly increasing, but the lack of a quantitative method to interpret these measurements has been one major obstacle in broadening the applications of these methods. Among various theoretical models, one of the most successful approaches is a semiempirical model generally referred to as the vibrational spectroscopic map that is based on a rigorous theory of intermolecular interactions. Recently, genetic algorithm, neural network, and machine learning approaches have been applied to the development of vibrational solvatochromism theory. In this review, we provide comprehensive descriptions of the theoretical foundation and various examples showing its extraordinary successes in the interpretations of experimental observations. In addition, a brief introduction to a newly created repository Web site (http://frequencymap.org) for vibrational spectroscopic maps is presented. We anticipate that a combination of the vibrational frequency map approach and state-of-the-art multidimensional vibrational spectroscopy will be one of the most fruitful ways to study the structure and dynamics of chemical, biological, and functional molecular systems in the future.
Infrared (IR) absorption provides important chemical fingerprints of biomolecules. Protein secondary structure determination from IR spectra is tedious since its theoretical interpretation requires ...repeated expensive quantum-mechanical calculations in a fluctuating environment. Herein we present a novel machine learning protocol that uses a few key structural descriptors to rapidly predict amide I IR spectra of various proteins and agrees well with experiment. Its transferability enabled us to distinguish protein secondary structures, probe atomic structure variations with temperature, and monitor protein folding. This approach offers a cost-effective tool to model the relationship between protein spectra and their biological/chemical properties.
Display omitted
•Calculated circular dichroism spectra in the far- and near-UV spectra.•Calculated infra-red (IR) spectra in the amide I region.•Based on experimental structures and computational ...models of SARS-CoV-2 proteins.•Near-UV CD spectra offer greatest sensitivity to conformation.
Treatment for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes Covid-19, may well be predicated on knowledge of the structures of protein of this virus. However, often these cannot be determined easily or quickly. Herein, we provide calculated circular dichroism (CD) spectra in the far- and near-UV, and infra-red (IR) spectra in the amide I region for experimental structures and computational models of SARS-CoV-2 proteins. The near-UV CD spectra offer greatest sensitivity in assessing the accuracy of models.
We present an atomistic force field for the azo-moiety of the photoswitchable FK-11-X peptide. We use the parameters to study the unfolding of the peptide through molecular dynamics simulations. The ...unfolded ensemble contains many different structures, ranging from a partially unfolded peptide to a fully unfolded structure. The averaged computed far-ultraviolet circular dichroism (CD) spectrum of the set of structures, which was simulated using the newly developed force field, agrees well with experiment. The rate of the simulated unfolding process was estimated to have a time constant of 5.80 ± 0.03 ns from the time evolution of the CD spectrum of the peptide, computed from the backbone conformations sampled over 40 simulated trajectories. Our estimated time constant is faster than, but not inconsistent with, previous experimental estimates from time-resolved infrared and optical rotatory dispersion spectroscopy.
Vibrational structure in the near-UV circular dichroism (CD) spectra of proteins is an important source of information on protein conformation and can be exploited to study structure and folding. A ...fully quantitative theory of the relationship between protein conformation and optical spectroscopy would facilitate deeper interpretation of and insight into biophysical and simulation studies of protein dynamics and folding. We have developed new models of the aromatic side chain chromophores toluene,
-cresol and 3-methylindole, which incorporate
calculations of the Franck-Condon effect into first principles calculations of CD using an exciton approach. The near-UV CD spectra of 40 proteins are calculated with the new parameter set and the correlation between the computed and the experimental intensity from 270 to 290 nm is much improved. The contribution of individual chromophores to the CD spectra has been calculated for several mutants and in many cases helps rationalize changes in their experimental spectra. Considering conformational flexibility by using families of NMR structures leads to further improvements for some proteins and illustrates an informative level of sensitivity to side chain conformation. In several cases, the near-UV CD calculations can distinguish the native protein structure from a set of computer-generated misfolded decoy structures.
The prediction of the secondary structure of a protein is a critical step in the prediction of its tertiary structure and, potentially, its function. Moreover, the backbone dihedral angles, highly ...correlated with secondary structures, provide crucial information about the local three-dimensional structure.
We predict independently both the secondary structure and the backbone dihedral angles and combine the results in a loop to enhance each prediction reciprocally. Support vector machines, a state-of-the-art supervised classification technique, achieve secondary structure predictive accuracy of 80% on a non-redundant set of 513 proteins, significantly higher than other methods on the same dataset. The dihedral angle space is divided into a number of regions using two unsupervised clustering techniques in order to predict the region in which a new residue belongs. The performance of our method is comparable to, and in some cases more accurate than, other multi-class dihedral prediction methods.
We have created an accurate predictor of backbone dihedral angles and secondary structure. Our method, called DISSPred, is available online at http://comp.chem.nottingham.ac.uk/disspred/.
Selecting greener solvents during experiment design is imperative for greener chemistry. While many solvent selection guides are currently used in the pharmaceutical industry, these are often ...paper-based guides which can make it difficult to identify and compare specific solvents. This work presents a stand-alone version of the solvent flashcards that were developed as part of the AI4Green electronic laboratory notebook. The functionality is an intuitive and interactive interface for the visualisation of data from CHEM21, a pharmaceutical solvent selection guide that categorises solvents according to “greenness”. This open-source software is written in Python, JavaScript, HTML and CSS and allows users to directly contrast and compare specific solvents by generating colour-coded flashcards. It can be installed locally using pip, or alternatively the source code is available on GitHub:
https://github.com/AI4Green/solvent_flashcards
. The documentation can also be found on GitHub or on the corresponding Python Package Index webpage:
https://pypi.org/project/solvent-guide/
.
Scientific Contribution
This simple and easy-to-use digital tool provides a visualisation of solvent greenness data through a novel intuitive interface and encourages green chemistry. It offers numerous advantages over traditional solvent selection guides, allowing users to directly customise the solvent list and generate side-by-side comparisons of only the most important solvents. The release as a standalone package will maximise the benefit of this software.
Graphical Abstract
For benzene, toluene, aniline, fluorobenzene, and phenol, even sophisticated treatments of electron correlation, such as MRCI and XMS-CASPT2 calculations, show oscillator strengths typically lower ...than experiment. Inclusion of a simple pseudo-diabatization approach to perturb the S
state with approximate vibronic coupling to the S
state for each molecule results in more accurate oscillator strengths. Their absolute values agree better with experiment for all molecules except aniline. When the coupling between the S
and S
states is strong at the S
geometry, the simple diabatization scheme performs less well with respect to the oscillator strengths relative to the adiabatic values. However, we expect the scheme to be useful in many cases where the coupling is weak to moderate (where the maximum component of the coupling has a magnitude less than 1.5 au). Such calculations give an insight into the effects of vibronic coupling of excited states on UV/vis spectra.