Protein kinases catalyse the phosphorylation of target proteins, controlling most cellular processes. The specificity of serine/threonine kinases is partly determined by interactions with a few ...residues near the phospho-acceptor residue, forming the so-called kinase-substrate motif. Kinases have been extensively duplicated throughout evolution, but little is known about when in time new target motifs have arisen. Here, we show that sequence variation occurring early in the evolution of kinases is dominated by changes in specificity-determining residues. We then analysed kinase specificity models, based on known target sites, observing that specificity has remained mostly unchanged for recent kinase duplications. Finally, analysis of phosphorylation data from a taxonomically broad set of 48 eukaryotic species indicates that most phosphorylation motifs are broadly distributed in eukaryotes but are not present in prokaryotes. Overall, our results suggest that the set of eukaryotes kinase motifs present today was acquired around the time of the eukaryotic last common ancestor and that early expansions of the protein kinase fold rapidly explored the space of possible target motifs.
Proteins are the key molecular machines that orchestrate all biological processes of the cell. Most proteins fold into three-dimensional shapes that are critical for their function. Studying the 3D ...shape of proteins can inform us of the mechanisms that underlie biological processes in living cells and can have practical applications in the study of disease mutations or the discovery of novel drug treatments. Here, we review the progress made in sequence-based prediction of protein structures with a focus on applications that go beyond the prediction of single monomer structures. This includes the application of deep learning methods for the prediction of structures of protein complexes, different conformations, the evolution of protein structures and the application of these methods to protein design. These developments create new opportunities for research that will have impact across many areas of biomedical research.
Predicting protein structure from sequence information has been a long-standing challenge. This Review discusses recent developments and applications of deep learning-based methods for protein structure prediction and design.
Protein phosphorylation is a key post-translational modification regulating protein function in almost all cellular processes. Although tens of thousands of phosphorylation sites have been identified ...in human cells, approaches to determine the functional importance of each phosphosite are lacking. Here, we manually curated 112 datasets of phospho-enriched proteins, generated from 104 different human cell types or tissues. We re-analyzed the 6,801 proteomics experiments that passed our quality control criteria, creating a reference phosphoproteome containing 119,809 human phosphosites. To prioritize functional sites, we used machine learning to identify 59 features indicative of proteomic, structural, regulatory or evolutionary relevance and integrate them into a single functional score. Our approach identifies regulatory phosphosites across different molecular mechanisms, processes and diseases, and reveals genetic susceptibilities at a genomic scale. Several regulatory phosphosites were experimentally validated, including identifying a role in neuronal differentiation for phosphosites in SMARCC2, a member of the SWI/SNF chromatin-remodeling complex.
Amino acids fulfil a diverse range of roles in proteins, each utilising its chemical properties in different ways in different contexts to create required functions. For example, cysteines form ...disulphide or hydrogen bonds in different circumstances and charged amino acids do not always make use of their charge. The repertoire of amino acid functions and the frequency at which they occur in proteins remains understudied. Measuring large numbers of mutational consequences, which can elucidate the role an amino acid plays, was prohibitively time‐consuming until recent developments in deep mutational scanning. In this study, we gathered data from 28 deep mutational scanning studies, covering 6,291 positions in 30 proteins, and used the consequences of mutation at each position to define a mutational landscape. We demonstrated rich relationships between this landscape and biophysical or evolutionary properties. Finally, we identified 100 functional amino acid subtypes with a data‐driven clustering analysis and studied their features, including their frequencies and chemical properties such as tolerating polarity, hydrophobicity or being intolerant of charge or specific amino acids. The mutational landscape and amino acid subtypes provide a foundational catalogue of amino acid functional diversity, which will be refined as the number of studied protein positions increases.
SYNOPSIS
Thirty three deep mutational scans are combined into a standardised landscape of 6,291 positions' mutational properties, used to explore biophysical properties and divide each amino acid into positional subtypes.
Fitness measurements from diverse deep mutational scans can be standardised, combined and compared.
The landscape of protein positions' fitness score vectors has rich relationships with biophysical properties.
Positions of each amino acid can be clustered into subtypes with similar mutational properties.
These subtypes contain positions fulfilling similar biological roles, e.g. cysteine positions forming disulphide bonds and ligand interactions are separated from those with hydrophobic roles.
Thirty three deep mutational scans are combined into a standardised landscape of 6,291 positions' mutational properties, used to explore biophysical properties and divide each amino acid into positional subtypes.
Genome-wide association studies have discovered numerous genomic loci associated with Alzheimer's disease (AD); yet the causal genes and variants are incompletely identified. We performed an updated ...genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2. Using three SNP-level fine-mapping methods, we identified 21 SNPs with >50% probability each of being causally involved in AD risk and others strongly suggested by functional annotation. We followed this with colocalization analyses across 109 gene expression quantitative trait loci datasets and prioritization of genes by using protein interaction networks and tissue-specific expression. Combining this information into a quantitative score, we found that evidence converged on likely causal genes, including the above four genes, and those at previously discovered AD loci, including BIN1, APH1B, PTK2B, PILRA and CASS4.
Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure ...predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research.
Protein post‐translational modifications (PTMs) allow the cell to regulate protein activity and play a crucial role in the response to changes in external conditions or internal states. Advances in ...mass spectrometry now enable proteome wide characterization of PTMs and have revealed a broad functional role for a range of different types of modifications. Here we review advances in the study of the evolution and function of PTMs that were spurred by these technological improvements. We provide an overview of studies focusing on the origin and evolution of regulatory enzymes as well as the evolutionary dynamics of modification sites. Finally, we discuss different mechanisms of altering protein activity via post‐translational regulation and progress made in the large‐scale functional characterization of PTM function.
Advances in proteomics have opened new avenues for the analysis of the evolution of protein post‐translational modifications (PTMs) and have enabled the large‐scale functional characterization of a range of different modifications types.
Understanding coding mutations is important for many applications in biology and medicine but the vast mutation space makes comprehensive experimental characterisation impossible. Current predictors ...are often computationally intensive and difficult to scale, including recent deep learning models. We introduce Sequence UNET, a highly scalable deep learning architecture that classifies and predicts variant frequency from sequence alone using multi-scale representations from a fully convolutional compression/expansion architecture. It achieves comparable pathogenicity prediction to recent methods. We demonstrate scalability by analysing 8.3B variants in 904,134 proteins detected through large-scale proteomics. Sequence UNET runs on modest hardware with a simple Python package.
Protein kinases are an important class of enzymes involved in the phosphorylation of their targets, which regulate key cellular processes and are typically mediated by a specificity for certain ...residues around the target phospho-acceptor residue. While efforts have been made to identify such specificities, only ∼30% of human kinases have a significant number of known binding sites. We describe a computational method that utilizes functional interaction data and phosphorylation data to predict specificities of kinases. We applied this method to human kinases to predict substrate preferences for 57% of all known kinases and show that we are able to reconstruct well-known specificities. We used an in vitro mass spectrometry approach to validate four understudied kinases and show that predicted models closely resemble true specificities. We show that this method can be applied to different organisms and can be extended to other phospho-recognition domains. Applying this approach to different types of posttranslational modifications (PTMs) and binding domains could uncover specificities of understudied PTM recognition domains and provide significant insight into the mechanisms of signaling networks.
Cross-talk between different types of post-translational modifications on the same protein molecule adds specificity and combinatorial logic to signal processing, but it has not been characterized on ...a large-scale basis. We developed two methods to identify protein isoforms that are both phosphorylated and ubiquitylated in the yeast Saccharomyces cerevisiae, identifying 466 proteins with 2,100 phosphorylation sites co-occurring with 2,189 ubiquitylation sites. We applied these methods quantitatively to identify phosphorylation sites that regulate protein degradation via the ubiquitin-proteasome system. Our results demonstrate that distinct phosphorylation sites are often used in conjunction with ubiquitylation and that these sites are more highly conserved than the entire set of phosphorylation sites. Finally, we investigated how the phosphorylation machinery can be regulated by ubiquitylation. We found evidence for novel regulatory mechanisms of kinases and 14-3-3 scaffold proteins via proteasome-independent ubiquitylation.