To accurately predict protein conformations in atomic detail, a computational method must be capable of sampling models sufficiently close to the native structure. All-atom sampling is difficult ...because of the vast number of possible conformations and extremely rugged energy landscapes. Here, we test three sampling strategies to address these difficulties: conformational diversification, intensification of torsion and omega-angle sampling and parameter annealing. We evaluate these strategies in the context of the robotics-based kinematic closure (KIC) method for local conformational sampling in Rosetta on an established benchmark set of 45 12-residue protein segments without regular secondary structure. We quantify performance as the fraction of sub-Angstrom models generated. While improvements with individual strategies are only modest, the combination of intensification and annealing strategies into a new "next-generation KIC" method yields a four-fold increase over standard KIC in the median percentage of sub-Angstrom models across the dataset. Such improvements enable progress on more difficult problems, as demonstrated on longer segments, several of which could not be accurately remodeled with previous methods. Given its improved sampling capability, next-generation KIC should allow advances in other applications such as local conformational remodeling of multiple segments simultaneously, flexible backbone sequence design, and development of more accurate energy functions.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The database of 3D interacting domains (3did, available online for browsing and bulk download at http://3did.irbbarcelona.org) is a catalog of protein-protein interactions for which a high-resolution ...3D structure is known. 3did collects and classifies all structural templates of domain-domain interactions in the Protein Data Bank, providing molecular details for such interactions. The current version also includes a pipeline for the discovery and annotation of novel domain-motif interactions. For every interaction, 3did identifies and groups different binding modes by clustering similar interfaces into 'interaction topologies'. By maintaining a constantly updated collection of domain-based structural interaction templates, 3did is a reference source of information for the structural characterization of protein interaction networks. 3did is updated every 6 months.
Most biological processes are regulated through complex networks of transient protein interactions where a globular domain in one protein recognizes a linear peptide from another, creating a ...relatively small contact interface. Although sufficient to ensure binding, these linear motifs alone are usually too short to achieve the high specificity observed, and additional contacts are often encoded in the residues surrounding the motif (i.e. the context). Here, we systematically identified all instances of peptide-mediated protein interactions of known three-dimensional structure and used them to investigate the individual contribution of motif and context to the global binding energy. We found that, on average, the context is responsible for roughly 20% of the binding and plays a crucial role in determining interaction specificity, by either improving the affinity with the native partner or impeding non-native interactions. We also studied and quantified the topological and energetic variability of interaction interfaces, finding a much higher heterogeneity in the context residues than in the consensus binding motifs. Our analysis partially reveals the molecular mechanisms responsible for the dynamic nature of peptide-mediated interactions, and suggests a global evolutionary mechanism to maximise the binding specificity. Finally, we investigated the viability of non-native interactions and highlight cases of potential cross-reaction that might compensate for individual protein failure and establish backup circuits to increase the robustness of cell networks.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Genome-wide association studies have established many links between genes and disease but do not reveal the effect of most of the many possible variants within each disease-related gene. ...while the ...explosion in sequencing of human genomes has revealed millions of missense variants that change protein sequences, we only understand the phenotypic and clinical consequences of a minute fraction of these. Some methods can provide detailed mechanistic understanding, yet they can be time consuming since each variant is handled individually and further, they are most easily applied retrospectively. ...most current functional assays are challenging to scale to the almost 18,000 possible single amino acid substitutions in MSH2, making it difficult to assign pathogenicity to any new clinically discovered variant. ...each variant’s change in frequency is used to compute a score (normalised to wild-type fitness) that quantifies the effect of the variant on the property selected for. ...they measure mutation rates on a curated set of 185 variants from ClinVar and other clinical sources, which includes benign, pathogenic, and VUS, and find that the assay captures most of these pathogenicity classifications.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Many biological responses to intra- and extracellular stimuli are regulated through complex networks of transient protein interactions where a globular domain in one protein recognizes a linear ...peptide from another, creating a relatively small contact interface. These peptide stretches are often found in unstructured regions of proteins, and contain a consensus motif complementary to the interaction surface displayed by their binding partners. While most current methods for the de novo discovery of such motifs exploit their tendency to occur in disordered regions, our work here focuses on another observation: upon binding to their partner domain, motifs adopt a well-defined structure. Indeed, through the analysis of all peptide-mediated interactions of known high-resolution three-dimensional (3D) structure, we found that the structure of the peptide may be as characteristic as the consensus motif, and help identify target peptides even though they do not match the established patterns. Our analyses of the structural features of known motifs reveal that they tend to have a particular stretched and elongated structure, unlike most other peptides of the same length. Accordingly, we have implemented a strategy based on a Support Vector Machine that uses this features, along with other structure-encoded information about binding interfaces, to search the set of protein interactions of known 3D structure and to identify unnoticed peptide-mediated interactions among them. We have also derived consensus patterns for these interactions, whenever enough information was available, and compared our results with established linear motif patterns and their binding domains. Finally, to cross-validate our identification strategy, we scanned interactome networks from four model organisms with our newly derived patterns to see if any of them occurred more often than expected. Indeed, we found significant over-representations for 64 domain-motif interactions, 46 of which had not been described before, involving over 6,000 interactions in total for which we could suggest the molecular details determining the binding.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Amino acid substitutions can perturb protein activity in multiple ways. Understanding their mechanistic basis may pinpoint how residues contribute to protein function. Here, we characterize the ...mechanisms underlying variant effects in human glucokinase (GCK) variants, building on our previous comprehensive study on GCK variant activity.
Using a yeast growth-based assay, we score the abundance of 95% of GCK missense and nonsense variants. When combining the abundance scores with our previously determined activity scores, we find that 43% of hypoactive variants also decrease cellular protein abundance. The low-abundance variants are enriched in the large domain, while residues in the small domain are tolerant to mutations with respect to abundance. Instead, many variants in the small domain perturb GCK conformational dynamics which are essential for appropriate activity.
In this study, we identify residues important for GCK metabolic stability and conformational dynamics. These residues could be targeted to modulate GCK activity, and thereby affect glucose homeostasis.
Predicting the thermodynamic stability of proteins is a common and widely used step in protein engineering, and when elucidating the molecular mechanisms behind evolution and disease. Here, we ...present RaSP, a method for making rapid and accurate predictions of changes in protein stability by leveraging deep learning representations. RaSP performs on-par with biophysics-based methods and enables saturation mutagenesis stability predictions in less than a second per residue. We use RaSP to calculate ∼ 230 million stability changes for nearly all single amino acid changes in the human proteome, and examine variants observed in the human population. We find that variants that are common in the population are substantially depleted for severe destabilization, and that there are substantial differences between benign and pathogenic variants, highlighting the role of protein stability in genetic diseases. RaSP is freely available-including via a Web interface-and enables large-scale analyses of stability in experimental and predicted protein structures.
Germline mutations in the folliculin (FLCN) tumor suppressor gene are linked to Birt-Hogg-Dubé (BHD) syndrome, a dominantly inherited genetic disease characterized by predisposition to ...fibrofolliculomas, lung cysts, and renal cancer. Most BHD-linked FLCN variants include large deletions and splice site aberrations predicted to cause loss of function. The mechanisms by which missense variants and short in-frame deletions in FLCN trigger disease are unknown. Here, we present an integrated computational and experimental study that reveals that the majority of such disease-causing FLCN variants cause loss of function due to proteasomal degradation of the encoded FLCN protein, rather than directly ablating FLCN function. Accordingly, several different single-site FLCN variants are present at strongly reduced levels in cells. In line with our finding that FLCN variants are protein quality control targets, several are also highly insoluble and fail to associate with the FLCN-binding partners FNIP1 and FNIP2. The lack of FLCN binding leads to rapid proteasomal degradation of FNIP1 and FNIP2. Half of the tested FLCN variants are mislocalized in cells, and one variant (ΔE510) forms perinuclear protein aggregates. A yeast-based stability screen revealed that the deubiquitylating enzyme Ubp15/USP7 and molecular chaperones regulate the turnover of the FLCN variants. Lowering the temperature led to a stabilization of two FLCN missense proteins, and for one (R362C), function was re-established at low temperature. In conclusion, we propose that most BHD-linked FLCN missense variants and small in-frame deletions operate by causing misfolding and degradation of the FLCN protein, and that stabilization and resulting restoration of function may hold therapeutic potential of certain disease-linked variants. Our computational saturation scan encompassing both missense variants and single site deletions in FLCN may allow classification of rare FLCN variants of uncertain clinical significance.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Unstable proteins are prone to form non-native interactions with other proteins and thereby may become toxic. To mitigate this, destabilized proteins are targeted by the protein quality control ...network. Here we present systematic studies of the cytosolic aspartoacylase, ASPA, where variants are linked to Canavan disease, a lethal neurological disorder. We determine the abundance of 6152 of the 6260 ( ~ 98%) possible single amino acid substitutions and nonsense ASPA variants in human cells. Most low abundance variants are degraded through the ubiquitin-proteasome pathway and become toxic upon prolonged expression. The data correlates with predicted changes in thermodynamic stability, evolutionary conservation, and separate disease-linked variants from benign variants. Mapping of degradation signals (degrons) shows that these are often buried and the C-terminal region functions as a degron. The data can be used to interpret Canavan disease variants and provide insight into the relationship between protein stability, degradation and cell fitness.
Accurate methods to assess the pathogenicity of mutations are needed to fully leverage the possibilities of genome sequencing in diagnosis. Current data-driven and bioinformatics approaches are, ...however, limited by the large number of new variations found in each newly sequenced genome, and often do not provide direct mechanistic insight. Here we demonstrate, for the first time, that saturation mutagenesis, biophysical modeling and co-variation analysis, performed in silico, can predict the abundance, metabolic stability, and function of proteins inside living cells. As a model system, we selected the human mismatch repair protein, MSH2, where missense variants are known to cause the hereditary cancer predisposition disease, known as Lynch syndrome. We show that the majority of disease-causing MSH2 mutations give rise to folding defects and proteasome-dependent degradation rather than inherent loss of function, and accordingly our in silico modeling data accurately identifies disease-causing mutations and outperforms the traditionally used genetic disease predictors. Thus, in conclusion, in silico biophysical modeling should be considered for making genotype-phenotype predictions and for diagnosis of Lynch syndrome, and perhaps other hereditary diseases.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK