The assembly of individual proteins into functional complexes is fundamental to nearly all biological processes. In recent decades, many thousands of homomeric and heteromeric protein complex ...structures have been determined, greatly improving our understanding of the fundamental principles that control symmetric and asymmetric quaternary structure organization. Furthermore, our conception of protein complexes has moved beyond static representations to include dynamic aspects of quaternary structure, including conformational changes upon binding, multistep ordered assembly pathways, and structural fluctuations occurring within fully assembled complexes. Finally, major advances have been made in our understanding of protein complex evolution, both in reconstructing evolutionary histories of specific complexes and in elucidating general mechanisms that explain how quaternary structure tends to evolve. The evolution of quaternary structure occurs via changes in self-assembly state or through the gain or loss of protein subunits, and these processes can be driven by both adaptive and nonadaptive influences.
To deal with the huge number of novel protein‐coding variants identified by genome and exome sequencing studies, many computational variant effect predictors (VEPs) have been developed. Such ...predictors are often trained and evaluated using different variant data sets, making a direct comparison between VEPs difficult. In this study, we use 31 previously published deep mutational scanning (DMS) experiments, which provide quantitative, independent phenotypic measurements for large numbers of single amino acid substitutions, in order to benchmark and compare 46 different VEPs. We also evaluate the ability of DMS measurements and VEPs to discriminate between pathogenic and benign missense variants. We find that DMS experiments tend to be superior to the top‐ranking predictors, demonstrating the tremendous potential of DMS for identifying novel human disease mutations. Among the VEPs, DeepSequence clearly stood out, showing both the strongest correlations with DMS data and having the best ability to predict pathogenic mutations, which is especially remarkable given that it is an unsupervised method. We further recommend SNAP2, DEOGEN2, SNPs&GO, SuSPect and REVEL based upon their performance in these analyses.
Synopsis
Data from deep mutational scans is used to benchmark computational protein variant effect predictors using fully independent data. The performance of deep mutational scanning is also compared to computational predictors for identifying pathogenic variants.
DeepSequence is the method that correlates the best with deep mutational scanning data for human proteins.
Predictor performance depends heavily on the protein and fitness metric. For this reason, using results from multiple predictors is recommended. Other recommended predictors include SNAP2, DEOGEN2, SNPs&GO, SuSPect and REVEL.
Deep mutational scanning is generally superior to variant effect predictors for distinguishing pathogenic from benign variants.
Data from deep mutational scans is used to benchmark computational protein variant effect predictors using fully independent data. The performance of deep mutational scanning is also compared to computational predictors for identifying pathogenic variants.
Abstract
Most known pathogenic mutations occur in protein-coding regions of DNA and change the way proteins are made. Taking protein structure into account has therefore provided great insight into ...the molecular mechanisms underlying human genetic disease. While there has been much focus on how mutations can disrupt protein structure and thus cause a loss of function (LOF), alternative mechanisms, specifically dominant-negative (DN) and gain-of-function (GOF) effects, are less understood. Here, we investigate the protein-level effects of pathogenic missense mutations associated with different molecular mechanisms. We observe striking differences between recessive
vs
dominant, and LOF
vs
non-LOF mutations, with dominant, non-LOF disease mutations having much milder effects on protein structure, and DN mutations being highly enriched at protein interfaces. We also find that nearly all computational variant effect predictors, even those based solely on sequence conservation, underperform on non-LOF mutations. However, we do show that non-LOF mutations could potentially be identified by their tendency to cluster in three-dimensional space. Overall, our work suggests that many pathogenic mutations that act via DN and GOF mechanisms are likely being missed by current variant prioritisation strategies, but that there is considerable scope to improve computational predictions through consideration of molecular disease mechanisms.
The assembly of proteins into complexes and their interactions with other biomolecules are often vital for their biological function. While it is known that mutations at protein interfaces have a ...high potential to be damaging and cause human genetic disease, there has been relatively little consideration for how this varies between different types of interfaces. Here we investigate the properties of human pathogenic and putatively benign missense variants at homomeric (isologous and heterologous), heteromeric, DNA, RNA and other ligand interfaces, and at different regions in proteins with respect to those interfaces. We find that different types of interfaces vary greatly in their propensity to be associated with pathogenic mutations, with homomeric heterologous and DNA interfaces being particularly enriched in disease. We also find that residues that do not directly participate in an interface, but are close in three-dimensional space, show a significant disease enrichment. Finally, we observe that mutations at different types of interfaces tend to have distinct property changes when undergoing amino acid substitutions associated with disease, and that this is linked to substantial variability in their identification by computational variant effect predictors.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The rapidly increasing amount of data on human genetic variation has resulted in a growing demand to identify pathogenic mutations computationally, as their experimental validation is currently ...beyond reach. Here we show that alpha helices and beta strands differ significantly in their ability to tolerate mutations: helices can accumulate more mutations than strands without change, due to the higher numbers of inter-residue contacts in helices. This results in two patterns: a) the same number of mutations causes less structural change in helices than in strands; b) helices diverge more rapidly in sequence than strands within the same domains. Additionally, both helices and strands are significantly more robust than coils. Based on this observation we show that human missense mutations that change secondary structure are more likely to be pathogenic than those that do not. Moreover, inclusion of predicted secondary structure changes shows significant utility for improving upon state-of-the-art pathogenicity predictions.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Attempts at using protein structures to identify disease-causing mutations have been dominated by the idea that most pathogenic mutations are disruptive at a structural level. Therefore, ...computational stability predictors, which assess whether a mutation is likely to be stabilising or destabilising to protein structure, have been commonly used when evaluating new candidate disease variants, despite not having been developed specifically for this purpose. We therefore tested 13 different stability predictors for their ability to discriminate between pathogenic and putatively benign missense variants. We find that one method, FoldX, significantly outperforms all other predictors in the identification of disease variants. Moreover, we demonstrate that employing predicted absolute energy change scores improves performance of nearly all predictors in distinguishing pathogenic from benign variants. Importantly, however, we observe that the utility of computational stability predictors is highly heterogeneous across different proteins, and that they are all inferior to the best performing variant effect predictors for identifying pathogenic mutations. We suggest that this is largely due to alternate molecular mechanisms other than protein destabilisation underlying many pathogenic mutations. Thus, better ways of incorporating protein structural information and molecular mechanisms into computational variant effect predictors will be required for improved disease variant prioritisation.
Most known disease-causing mutations occur in protein-coding regions of DNA. While some of these involve a loss of protein function (e.g., through premature stop codons or missense changes that ...destabilize protein folding), many act via alternative molecular mechanisms and have dominant-negative or gain-of-function effects. In nearly all cases, these non-loss-of-function mutations can be understood by considering interactions of the wild-type and mutant protein with other molecules, such as proteins, nucleic acids, or small ligands and substrates. Here, we review the diverse molecular mechanisms by which pathogenic mutations can have non-loss-of-function effects, including by disrupting interactions, increasing binding affinity, changing binding specificity, causing assembly-mediated dominant-negative and dominant-positive effects, creating novel interactions, and promoting aggregation and phase separation. We believe that increased awareness of these diverse molecular disease mechanisms will lead to improved diagnosis (and ultimately treatment) of human genetic disorders.
Is the order in which proteins assemble into complexes important for biological function? Here, we seek to address this by searching for evidence of evolutionary selection for ordered protein complex ...assembly. First, we experimentally characterize the assembly pathways of several heteromeric complexes and show that they can be simply predicted from their three-dimensional structures. Then, by mapping gene fusion events identified from fully sequenced genomes onto protein complex assembly pathways, we demonstrate evolutionary selection for conservation of assembly order. Furthermore, using structural and high-throughput interaction data, we show that fusion tends to optimize assembly by simplifying protein complex topologies. Finally, we observe protein structural constraints on the gene order of fusion that impact the potential for fusion to affect assembly. Together, these results reveal the intimate relationships among protein assembly, quaternary structure, and evolution and demonstrate on a genome-wide scale the biological importance of ordered assembly pathways.
Display omitted
► 3D structures predict assembly pathways of protein complexes in solution ► Assembly pathways tend to be conserved in evolution, based on gene fusion data ► Gene fusion tends to optimize assembly by simplifying protein complex topologies ► Quaternary structure influences selection for fusion events with close N/C termini
The assembly pathway of most protein complexes can be predicted based on analysis of their subunit interfaces in 3D structures. Large-scale analyses of protein sequence, structure, and interaction data suggest that gene fusion events during evolution tend to conserve and optimize these ordered assembly pathways.
The assessment of variant effect predictor (VEP) performance is fraught with biases introduced by benchmarking against clinical observations. In this study, building on our previous work, we use ...independently generated measurements of protein function from deep mutational scanning (DMS) experiments for 26 human proteins to benchmark 55 different VEPs, while introducing minimal data circularity. Many top‐performing VEPs are unsupervised methods including EVE, DeepSequence and ESM‐1v, a protein language model that ranked first overall. However, the strong performance of recent supervised VEPs, in particular VARITY, shows that developers are taking data circularity and bias issues seriously. We also assess the performance of DMS and unsupervised VEPs for discriminating between known pathogenic and putatively benign missense variants. Our findings are mixed, demonstrating that some DMS datasets perform exceptionally at variant classification, while others are poor. Notably, we observe a striking correlation between VEP agreement with DMS data and performance in identifying clinically relevant variants, strongly supporting the validity of our rankings and the utility of DMS for independent benchmarking.
Synopsis
Common sources of bias in variant effect predictor benchmarking are assessed using data from deep mutational scanning experiments. ESM‐1v, EVE and DeepSequence are among the top performers on both functionally validated and clinically observed variants.
Deep mutational scanning datasets from 26 human proteins are used to benchmark 55 computational predictors of missense variant effect.
The top‐performing methods include several very recent predictors and are based mostly on unsupervised machine learning methodologies.
There is a strong correlation between predictor performance when benchmarked against deep mutational scanning data and clinical variants.
Common sources of bias in variant effect predictor benchmarking are assessed using data from deep mutational scanning experiments. ESM‐1v, EVE and DeepSequence are among the top performers on both functionally validated and clinically observed variants.
Cells rely heavily on microtubules for several processes, including cell division and molecular trafficking. Mutations in the different tubulin-α and -β proteins that comprise microtubules have been ...associated with various diseases and are often dominant, sporadic and congenital. While the earliest reported tubulin mutations affect neurodevelopment, mutations are also associated with other disorders such as bleeding disorders and infertility. We performed a systematic survey of tubulin mutations across all isotypes in order to improve our understanding of how they cause disease, and increase our ability to predict their phenotypic effects. Both protein structural analyses and computational variant effect predictors were very limited in their utility for differentiating between pathogenic and benign mutations. This was even worse for those genes associated with non-neurodevelopmental disorders. We selected tubulin-α and -β disease mutations that were most poorly predicted for experimental characterisation. These mutants co-localise to the mitotic spindle in HeLa cells, suggesting they may exert dominant-negative effects by altering microtubule properties. Our results show that tubulin mutations represent a blind spot for current computational approaches, being much more poorly predicted than mutations in most human disease genes. We suggest that this is likely due to their strong association with dominant-negative and gain-of-function mechanisms.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK