Neurons and glial cells in the developing brain arise from neural progenitor cells (NPCs). Nestin, an intermediate filament protein, is thought to be expressed exclusively by NPCs in the normal ...brain, and is replaced by the expression of proteins specific for neurons or glia in differentiated cells. Nestin expressing NPCs are found in the adult brain in the subventricular zone (SVZ) of the lateral ventricle and the subgranular zone (SGZ) of the dentate gyrus. While significant attention has been paid to studying NPCs in the SVZ and SGZ in the adult brain, relatively little attention has been paid to determining whether nestin-expressing neural cells (NECs) exist outside of the SVZ and SGZ. We therefore stained sections immunocytochemically from the adult rat and human brain for NECs, observed four distinct classes of these cells, and present here the first comprehensive report on these cells. Class I cells are among the smallest neural cells in the brain and are widely distributed. Class II cells are located in the walls of the aqueduct and third ventricle. Class IV cells are found throughout the forebrain and typically reside immediately adjacent to a neuron. Class III cells are observed only in the basal forebrain and closely related areas such as the hippocampus and corpus striatum. Class III cells resemble neurons structurally and co-express markers associated exclusively with neurons. Cell proliferation experiments demonstrate that Class III cells are not recently born. Instead, these cells appear to be mature neurons in the adult brain that express nestin. Neurons that express nestin are not supposed to exist in the brain at any stage of development. That these unique neurons are found only in brain regions involved in higher order cognitive function suggests that they may be remodeling their cytoskeleton in supporting the neural plasticity required for these functions.
The advent of computational drug discovery holds the promise of significantly reducing the effort of experimentalists, along with monetary cost. More generally, predicting the binding of small ...organic molecules to biological macromolecules has far-reaching implications for a range of problems, including metabolomics. However, problems such as predicting the bound structure of a protein–ligand complex along with its affinity have proven to be an enormous challenge. In recent years, machine learning-based methods have proven to be more accurate than older methods, many based on simple linear regression. Nonetheless, there remains room for improvement, as these methods are often trained on a small set of features, with a single functional form for any given physical effect, and often with little mention of the rationale behind choosing one functional form over another. Moreover, it is not entirely clear why one machine learning method is favored over another. In this work, we endeavor to undertake a comprehensive effort towards developing high-accuracy, machine-learned scoring functions, systematically investigating the effects of machine learning method and choice of features, and, when possible, providing insights into the relevant physics using methods that assess feature importance. Here, we show synergism among disparate features, yielding adjusted
R
2
with experimental binding affinities of up to 0.871 on an independent test set and enrichment for native bound structures of up to 0.913. When purely physical terms that model enthalpic and entropic effects are used in the training, we use feature importance assessments to probe the relevant physics and hopefully guide future investigators working on this and other computational chemistry problems.
Many proteins have small-molecule binding pockets that are not easily detectable in the ligand-free structures. These cryptic sites require a conformational change to become apparent; a cryptic site ...can therefore be defined as a site that forms a pocket in a holo structure, but not in the apo structure. Because many proteins appear to lack druggable pockets, understanding and accurately identifying cryptic sites could expand the set of drug targets. Previously, cryptic sites were identified experimentally by fragment-based ligand discovery and computationally by long molecular dynamics simulations and fragment docking. Here, we begin by constructing a set of structurally defined apo–holo pairs with cryptic sites. Next, we comprehensively characterize the cryptic sites in terms of their sequence, structure, and dynamics attributes. We find that cryptic sites tend to be as conserved in evolution as traditional binding pockets but are less hydrophobic and more flexible. Relying on this characterization, we use machine learning to predict cryptic sites with relatively high accuracy (for our benchmark, the true positive and false positive rates are 73% and 29%, respectively). We then predict cryptic sites in the entire structurally characterized human proteome (11,201 structures, covering 23% of all residues in the proteome). CryptoSite increases the size of the potentially “druggable” human proteome from ~40% to ~78% of disease-associated proteins. Finally, to demonstrate the utility of our approach in practice, we experimentally validate a cryptic site in protein tyrosine phosphatase 1B using a covalent ligand and NMR spectroscopy. The CryptoSite Web server is available at http://salilab.org/cryptosite.
Display omitted
•Bona fide cryptic sites identified by comparison of apo and holo protein structures.•Features distinguishing cryptic sites and binding pockets identified.•Efficient and accurate prediction of cryptic sites developed.•Cryptic sites predicted for all human proteins of known structure.•The “druggable” human proteome may be larger than previously estimated.
Advanced potential energy surfaces are defined as theoretical models that explicitly include many-body effects that transcend the standard fixed-charge, pairwise-additive paradigm typically used in ...molecular simulation. However, several factors relating to their software implementation have precluded their widespread use in condensed-phase simulations: the computational cost of the theoretical models, a paucity of approximate models and algorithmic improvements that can ameliorate their cost, underdeveloped interfaces and limited dissemination in computational code bases that are widely used in the computational chemistry community, and software implementations that have not kept pace with modern high-performance computing (HPC) architectures, such as multicore CPUs and modern graphics processing units (GPUs). In this Feature Article we review recent progress made in these areas, including well-defined polarization approximations and new multipole electrostatic formulations, novel methods for solving the mutual polarization equations and increasing the MD time step, combining linear-scaling electronic structure methods with new QM/MM methods that account for mutual polarization between the two regions, and the greatly improved software deployment of these models and methods onto GPU and CPU hardware platforms. We have now approached an era where multipole-based polarizable force fields can be routinely used to obtain computational results comparable to state-of-the-art density functional theory while reaching sampling statistics that are acceptable when compared to that obtained from simpler fixed partial charge force fields.
In allostery, a binding event at one site in a protein modulates the behavior of a distant site. Identifying residues that relay the signal between sites remains a challenge. We have developed ...predictive models using support-vector machines, a widely used machine-learning method. The training data set consisted of residues classified as either hotspots or non-hotspots based on experimental characterization of point mutations from a diverse set of allosteric proteins. Each residue had an associated set of calculated features. Two sets of features were used, one consisting of dynamical, structural, network, and informatic measures, and another of structural measures defined by Daily and Gray. The resulting models performed well on an independent data set consisting of hotspots and non-hotspots from five allosteric proteins. For the independent data set, our top 10 models using Feature Set 1 recalled 68-81% of known hotspots, and among total hotspot predictions, 58-67% were actual hotspots. Hence, these models have precision P = 58-67% and recall R = 68-81%. The corresponding models for Feature Set 2 had P = 55-59% and R = 81-92%. We combined the features from each set that produced models with optimal predictive performance. The top 10 models using this hybrid feature set had R = 73-81% and P = 64-71%, the best overall performance of any of the sets of models. Our methods identified hotspots in structural regions of known allosteric significance. Moreover, our predicted hotspots form a network of contiguous residues in the interior of the structures, in agreement with previous work. In conclusion, we have developed models that discriminate between known allosteric hotspots and non-hotspots with high accuracy and sensitivity. Moreover, the pattern of predicted hotspots corresponds to known functional motifs implicated in allostery, and is consistent with previous work describing sparse networks of allosterically important residues.
Human myeloid-derived growth factor (hMYDGF) is a 142-residue protein with a C-terminal endoplasmic reticulum (ER) retention sequence (ERS). Extracellular MYDGF mediates cardiac repair in mice after ...anoxic injury. Although homologs of hMYDGF are found in eukaryotes as distant as protozoans, its structure and function are unknown. Here we present the NMR solution structure of hMYDGF, which consists of a short α-helix and ten β-strands distributed in three β-sheets. Conserved residues map to the unstructured ERS, loops on the face opposite the ERS, and the surface of a cavity underneath the conserved loops. The only protein or portion of a protein known to have a similar fold is the base domain of VNN1. We suggest, in analogy to the tethering of the VNN1 nitrilase domain to the plasma membrane via its base domain, that MYDGF complexed to the KDEL receptor binds cargo via its conserved residues for transport to the ER.
Carbon dioxide (CO2) is a detrimental greenhouse gas and is the main contributor to global warming. In addressing this environmental challenge, a promising approach emerges through the utilization of ...deep eutectic solvents (DESs) as an ecofriendly and sustainable medium for effective CO2 capture. Chemically reactive DESs, which form chemical bonds with the CO2, are superior to nonreactive, physically based DESs for CO2 absorption. However, there are no accurate computational models that provide accurate predictions of the CO2 solubility in chemically reactive DESs. Here, we develop machine learning (ML) models to predict the solubility of CO2 in chemically reactive DESs. As training data, we collected 214 data points for the CO2 solubility in 149 different chemically reactive DESs at different temperatures, pressures, and DES molar ratios from published work. The physics-driven input features for the ML models include σ-profile descriptors that quantify the relative probability of a molecular surface segment having a certain screening charge density and were calculated with the first-principle quantum chemical method COSMO-RS. We show here that, although COSMO-RS does not explicitly calculate chemical reaction profiles, the COSMO-RS-derived σ-profile features can be used to predict bond formation. Of the models trained, an artificial neural network (ANN) provides the most accurate CO2 solubility prediction with an average absolute relative deviation of 2.94% on the testing sets. Overall, this work provides ML models that can predict CO2 solubility precisely and thus accelerate the design and application of chemically reactive DESs.