Amino acids fulfil a diverse range of roles in proteins, each utilising its chemical properties in different ways in different contexts to create required functions. For example, cysteines form ...disulphide or hydrogen bonds in different circumstances and charged amino acids do not always make use of their charge. The repertoire of amino acid functions and the frequency at which they occur in proteins remains understudied. Measuring large numbers of mutational consequences, which can elucidate the role an amino acid plays, was prohibitively time‐consuming until recent developments in deep mutational scanning. In this study, we gathered data from 28 deep mutational scanning studies, covering 6,291 positions in 30 proteins, and used the consequences of mutation at each position to define a mutational landscape. We demonstrated rich relationships between this landscape and biophysical or evolutionary properties. Finally, we identified 100 functional amino acid subtypes with a data‐driven clustering analysis and studied their features, including their frequencies and chemical properties such as tolerating polarity, hydrophobicity or being intolerant of charge or specific amino acids. The mutational landscape and amino acid subtypes provide a foundational catalogue of amino acid functional diversity, which will be refined as the number of studied protein positions increases.
SYNOPSIS
Thirty three deep mutational scans are combined into a standardised landscape of 6,291 positions' mutational properties, used to explore biophysical properties and divide each amino acid into positional subtypes.
Fitness measurements from diverse deep mutational scans can be standardised, combined and compared.
The landscape of protein positions' fitness score vectors has rich relationships with biophysical properties.
Positions of each amino acid can be clustered into subtypes with similar mutational properties.
These subtypes contain positions fulfilling similar biological roles, e.g. cysteine positions forming disulphide bonds and ligand interactions are separated from those with hydrophobic roles.
Thirty three deep mutational scans are combined into a standardised landscape of 6,291 positions' mutational properties, used to explore biophysical properties and divide each amino acid into positional subtypes.
Full text
Available for:
FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK
Understanding coding mutations is important for many applications in biology and medicine but the vast mutation space makes comprehensive experimental characterisation impossible. Current predictors ...are often computationally intensive and difficult to scale, including recent deep learning models. We introduce Sequence UNET, a highly scalable deep learning architecture that classifies and predicts variant frequency from sequence alone using multi-scale representations from a fully convolutional compression/expansion architecture. It achieves comparable pathogenicity prediction to recent methods. We demonstrate scalability by analysing 8.3B variants in 904,134 proteins detected through large-scale proteomics. Sequence UNET runs on modest hardware with a simple Python package.
Loss‐of‐function (LoF) mutations associated with disease do not manifest equally in different individuals. The impact of the genetic background on the consequences of LoF mutations remains poorly ...characterized. Here, we systematically assessed the changes in gene deletion phenotypes for 3,786 gene knockouts in four Saccharomyces cerevisiae strains and 38 conditions. We observed 18.5% of deletion phenotypes changing between pairs of strains on average with a small fraction conserved in all four strains. Conditions causing higher wild‐type growth differences and the deletion of pleiotropic genes showed above‐average changes in phenotypes. In addition, we performed a genome‐wide association study (GWAS) for growth under the same conditions for a panel of 925 yeast isolates. Gene–condition associations derived from GWAS were not enriched for genes with deletion phenotypes under the same conditions. However, cases where the results were congruent indicate the most likely mechanism underlying the GWAS signal. Overall, these results show a high degree of genetic background dependencies for LoF phenotypes.
Synopsis
A systematic evaluation of conditional gene essentiality changes across four Saccharomyces cerevisiae strains shows that on average, 18.5% of gene deletion phenotypes change between any pair of strains, indicating widespread genetic background effects.
Gene deletion growth measurements are performed for 38 conditions in four S. cerevisiae genetic backgrounds.
On average 18.5% of LoF phenotypes change between pairs of strains.
Conditions with highest stress differences between strains and pleiotropic genes explain some of the differences.
A systematic evaluation of conditional gene essentiality changes across four Saccharomyces cerevisiae strains shows that on average, 18.5% of gene deletion phenotypes change between any pair of strains, indicating widespread genetic background effects.
Full text
Available for:
FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK
Cellular functions are governed by molecular machines that assemble through protein-protein interactions. Their atomic details are critical to studying their molecular mechanisms. However, fewer than ...5% of hundreds of thousands of human protein interactions have been structurally characterized. Here we test the potential and limitations of recent progress in deep-learning methods using AlphaFold2 to predict structures for 65,484 human protein interactions. We show that experiments can orthogonally confirm higher-confidence models. We identify 3,137 high-confidence models, of which 1,371 have no homology to a known structure. We identify interface residues harboring disease mutations, suggesting potential mechanisms for pathogenic variants. Groups of interface phosphorylation sites show patterns of co-regulation across conditions, suggestive of coordinated tuning of multiple protein interactions as signaling responses. Finally, we provide examples of how the predicted binary complexes can be used to build larger assemblies helping to expand our understanding of human cell biology.
Full text
Available for:
GEOZS, IJS, IMTLJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK, ZAGLJ
Structural variants contribute substantially to genetic diversity and are important evolutionarily and medically, but they are still understudied. Here we present a comprehensive analysis of ...structural variation in the Human Genome Diversity panel, a high-coverage dataset of 911 samples from 54 diverse worldwide populations. We identify, in total, 126,018 variants, 78% of which were not identified in previous global sequencing projects. Some reach high frequency and are private to continental groups or even individual populations, including regionally restricted runaway duplications and putatively introgressed variants from archaic hominins. By de novo assembly of 25 genomes using linked-read sequencing, we discover 1,643 breakpoint-resolved unique insertions, in aggregate accounting for 1.9 Mb of sequence absent from the GRCh38 reference. Our results illustrate the limitation of a single human reference and the need for high-quality genomes from diverse populations to fully discover and understand human genetic variation.
Display omitted
•Catalog of structural variants from a diverse set of human populations•Identification of high-frequency population-specific variants•Highly stratified variants putatively introgressed from Neanderthals and Denisovans•De novo genome assemblies uncover and place sequences missing from the reference
Almarri et al. generate a structural variation atlas for a geographically diverse set of human genomes, including recovery of sequences missing from the human reference sequence.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
SARS-CoV-2 variants of concern (VOCs) emerged during the COVID-19 pandemic. Here, we used unbiased systems approaches to study the host-selective forces driving VOC evolution. We discovered that VOCs ...evolved convergent strategies to remodel the host by modulating viral RNA and protein levels, altering viral and host protein phosphorylation, and rewiring virus-host protein-protein interactions. Integrative computational analyses revealed that although Alpha, Beta, Gamma, and Delta ultimately converged to suppress interferon-stimulated genes (ISGs), Omicron BA.1 did not. ISG suppression correlated with the expression of viral innate immune antagonist proteins, including Orf6, N, and Orf9b, which we mapped to specific mutations. Later Omicron subvariants BA.4 and BA.5 more potently suppressed innate immunity than early subvariant BA.1, which correlated with Orf6 levels, although muted in BA.4 by a mutation that disrupts the Orf6-nuclear pore interaction. Our findings suggest that SARS-CoV-2 convergent evolution overcame human adaptive and innate immune barriers, laying the groundwork to tackle future pandemics.
Display omitted
•Systems analyses reveal the host-selective forces that drive SARS-CoV-2 evolution•Variants modulate viral protein levels, phosphorylation, and virus-host complexes•Variants converge on innate immune suppression by modulating viral proteins•Understanding innate/adaptive immune balance will aid future pandemic preparedness
Systems proteomic and genomic analyses reveal that SARS-CoV-2 variants of concern respond to the selective forces of the host immune response by modulating viral protein expression, phosphorylation, and virus-host protein-protein interactions. Variants have converged on similar strategies for innate and adaptive immune evasions, suggesting implications for predicting viral transmission and tackling future viral pandemics.
A common challenge arising from the observation of an evolutionary system over time is to infer the magnitude of selection acting upon a specific genetic variant, or variants, within the population. ...The inference of selection may be confounded by the effects of genetic drift in a system, leading to the development of inference procedures to account for these effects. However, recent work has suggested that deterministic models of evolution may be effective in capturing the effects of selection even under complex models of demography, suggesting the more general application of deterministic approaches to inference. Responding to this literature, we here note a case in which a deterministic model of evolution may give highly misleading inferences, resulting from the nondeterministic properties of mutation in a finite population. We propose an alternative approach that acts to correct for this error, and which we denote the delay-deterministic model. Applying our model to a simple evolutionary system, we demonstrate its performance in quantifying the extent of selection acting within that system. We further consider the application of our model to sequence data from an evolutionary experiment. We outline scenarios in which our model may produce improved results for the inference of selection, noting that such situations can be easily identified via the use of a regular deterministic model.