Alternative splicing plays a key role in the expansion of proteomic and regulatory complexity, yet the functions of the vast majority of differentially spliced exons are not known. In this study, we ...observe that brain and other tissue-regulated exons are significantly enriched in flexible regions of proteins that likely form conserved interaction surfaces. These proteins participate in significantly more interactions in protein-protein interaction (PPI) networks than other proteins. Using LUMIER, an automated PPI assay, we observe that approximately one-third of analyzed neural-regulated exons affect PPIs. Inclusion of these exons stimulated and repressed different partner interactions at comparable frequencies. This assay further revealed functions of individual exons, including a role for a neural-specific exon in promoting an interaction between Bridging Integrator 1 (Bin1)/Amphiphysin II and Dynamin 2 (Dnm2) that facilitates endocytosis. Collectively, our results provide evidence that regulated alternative exons frequently remodel interactions to establish tissue-dependent PPI networks.
Display omitted
▸ Proteins containing tissue-regulated exons have high centrality in PPI networks ▸ Tissue-regulated exons are highly enriched in protein regions that mediate PPIs ▸ LUMIER reveals that one-third of analyzed neural-specific exons modulate PPIs ▸ Exon-resolution functional mapping reveals specific roles for neural AS events
Deep generative modeling for protein design Strokach, Alexey; Kim, Philip M.
Current opinion in structural biology,
February 2022, 2022-02-00, 20220201, Letnik:
72
Journal Article
Recenzirano
Odprti dostop
Deep learning approaches have produced substantial breakthroughs in fields such as image classification and natural language processing and are making rapid inroads in the area of protein design. ...Many generative models of proteins have been developed that encompass all known protein sequences, model specific protein families, or extrapolate the dynamics of individual proteins. Those generative models can learn protein representations that are often more informative of protein structure and function than hand-engineered features. Furthermore, they can be used to quickly propose millions of novel proteins that resemble the native counterparts in terms of expression level, stability, or other attributes. The protein design process can further be guided by discriminative oracles to select candidates with the highest probability of having the desired properties. In this review, we discuss five classes of generative models that have been most successful at modeling proteins and provide a framework for model guided protein design.
•Machine learning is becoming a key component of the protein design process.•Deep generative models can produce novel protein sequences and structures.•Conditioned generative models can produce proteins with specific properties.•Discriminative oracles can be used to further fine-tune the design process.
Protein design is a technique to engineer proteins by permuting amino acids in the sequence to obtain novel functionalities. However, exploring all possible combinations of amino acids is generally ...impossible due to the exponential growth of possibilities with the number of designable sites. The present work introduces circuits implementing a pure quantum approach, Grover's algorithm, to solve protein design problems. Our algorithms can adjust to implement any custom pair-wise energy tables and protein structure models. Moreover, the algorithm's oracle is designed to consist of only adder functions. Quantum computer simulators validate the practicality of our circuits, containing up to 234 qubits. However, a smaller circuit is implemented on real quantum devices. Our results show that using Formula: see text iterations, the circuits find the correct results among all N possibilities, providing the expected quadratic speed up of Grover's algorithm over classical methods (i.e., Formula: see text).
Chronic obstructive pulmonary disease (COPD) is the third commonest cause of death globally, and manifests as a progressive inflammatory lung disease with no curative treatment. The lung microbiome ...contributes to COPD progression, but the function of the gut microbiome remains unclear. Here we examine the faecal microbiome and metabolome of COPD patients and healthy controls, finding 146 bacterial species differing between the two groups. Several species, including Streptococcus sp000187445, Streptococcus vestibularis and multiple members of the family Lachnospiraceae, also correlate with reduced lung function. Untargeted metabolomics identifies a COPD signature comprising 46% lipid, 20% xenobiotic and 20% amino acid related metabolites. Furthermore, we describe a disease-associated network connecting Streptococcus parasanguinis_B with COPD-associated metabolites, including N-acetylglutamate and its analogue N-carbamoylglutamate. While correlative, our results suggest that the faecal microbiome and metabolome of COPD patients are distinct from those of healthy individuals, and may thus aid in the search for biomarkers for COPD.
How species with similar repertoires of protein-coding genes differ so markedly at the phenotypic level is poorly understood. By comparing organ transcriptomes from vertebrate species spanning ∼350 ...million years of evolution, we observed significant differences in alternative splicing complexity between vertebrate lineages, with the highest complexity in primates. Within 6 million years, the splicing profiles of physiologically equivalent organs diverged such that they are more strongly related to the identity of a species than they are to organ type. Most vertebrate species-specific splicing patterns are cis-directed. However, a subset of pronounced splicing changes are predicted to remodel protein interactions involving trans-acting regulators. These events likely further contributed to the diversification of splicing and other transcriptomic changes that underlie phenotypic differences among vertebrate species.
Recent advances in computational/artificial intelligence (AI)-based methodologies for antibody engineering and discovery hold great promise for accelerating and improving the development of ...therapeutic antibodies.Databases hold large repertoires of antibody sequences, but only limited structural data; data on biophysical properties is also available.A large suite of predictors of different biophysical and other properties of antibodies have been developed.Deep learning approaches are improving the performance of structure prediction of antibodies, including CDRs, while de novo design remains a challenging problem.Protein language models are showing very promising results for the improvement of antibody activity and properties.
Due to their high target specificity and binding affinity, therapeutic antibodies are currently the largest class of biotherapeutics. The traditional largely empirical antibody development process is, while mature and robust, cumbersome and has significant limitations. Substantial recent advances in computational and artificial intelligence (AI) technologies are now starting to overcome many of these limitations and are increasingly integrated into development pipelines. Here, we provide an overview of AI methods relevant for antibody development, including databases, computational predictors of antibody properties and structure, and computational antibody design methods with an emphasis on machine learning (ML) models, and the design of complementarity-determining region (CDR) loops, antibody structural components critical for binding.
It has been a long-standing goal in systems biology to find relations between the topological properties and functional features of protein networks. However, most of the focus in network studies has ...been on highly connected proteins ("hubs"). As a complementary notion, it is possible to define bottlenecks as proteins with a high betweenness centrality (i.e., network nodes that have many "shortest paths" going through them, analogous to major bridges and tunnels on a highway map). Bottlenecks are, in fact, key connector proteins with surprising functional and dynamic properties. In particular, they are more likely to be essential proteins. In fact, in regulatory and other directed networks, betweenness (i.e., "bottleneck-ness") is a much more significant indicator of essentiality than degree (i.e., "hub-ness"). Furthermore, bottlenecks correspond to the dynamic components of the interaction network-they are significantly less well coexpressed with their neighbors than non-bottlenecks, implying that expression dynamics is wired into the network topology.
Biologics are a rapidly growing class of therapeutics with many advantages over traditional small molecule drugs. A major obstacle to their development is that proteins and peptides are easily ...destroyed by proteases and, thus, typically have prohibitively short half-lives in human gut, plasma, and cells. One of the most effective ways to prevent degradation is to engineer analogs from dextrorotary (D)-amino acids, with up to 10⁵-fold improvements in potency reported. We here propose a general peptide-engineering platform that overcomes limitations of previous methods. By creating a mirror image of every structure in the Protein Data Bank (PDB), we generate a database of ∼2.8 million D-peptides. To obtain a D-analog of a given peptide, we search the (D)-PDB for similar configurations of its critical—“hotspot”—residues. As a proof of concept, we apply our method to two peptides that are Food and Drug Administration approved as therapeutics for diabetes and osteoporosis, respectively. We obtain D-analogs that activate the GLP1 and PTH1 receptors with the same efficacy as their natural counterparts and show greatly increased half-life.
Protein structure and function is determined by the arrangement of the linear sequence of amino acids in 3D space. We show that a deep graph neural network, ProteinSolver, can precisely design ...sequences that fold into a predetermined shape by phrasing this challenge as a constraint satisfaction problem (CSP), akin to Sudoku puzzles. We trained ProteinSolver on over 70,000,000 real protein sequences corresponding to over 80,000 structures. We show that our method rapidly designs new protein sequences and benchmark them in silico using energy-based scores, molecular dynamics, and structure prediction methods. As a proof-of-principle validation, we use ProteinSolver to generate sequences that match the structure of serum albumin, then synthesize the top-scoring design and validate it in vitro using circular dichroism. ProteinSolver is freely available at http://design.proteinsolver.org and https://gitlab.com/ostrokach/proteinsolver. A record of this paper’s transparent peer review process is included in the Supplemental Information.
Display omitted
•Graph neural network generates new proteins with predetermined topologies•Probabilities assigned to individual amino acids correlate with stability of mutants•Probabilities assigned to amino acid sequences correlate with stability of designs•Orders of magnitude faster than traditional approaches
Strokach et al. developed ProteinSolver, a graph convolutional neural network trained on the PDB and sequences in UniParc to reconstruct amino acid sequences that adhere to constraints imposed by protein topologies. It can generate new sequences that fold into predetermined shapes and predict effects of mutations on stability.