QSAR without borders Muratov, Eugene N; Bajorath, Jürgen; Sheridan, Robert P ...
Chemical Society reviews,
06/2020, Volume:
49, Issue:
11
Journal Article
Peer reviewed
Open access
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in ...chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
Neural networks were widely used for quantitative structure–activity relationships (QSAR) in the 1990s. Because of various practical issues (e.g., slow on large problems, difficult to train, prone to ...overfitting, etc.), they were superseded by more robust methods like support vector machine (SVM) and random forest (RF), which arose in the early 2000s. The last 10 years has witnessed a revival of neural networks in the machine learning community thanks to new methods for preventing overfitting, more efficient training algorithms, and advancements in computer hardware. In particular, deep neural nets (DNNs), i.e. neural nets with more than one hidden layer, have found great successes in many applications, such as computer vision and natural language processing. Here we show that DNNs can routinely make better prospective predictions than RF on a set of large diverse QSAR data sets that are taken from Merck’s drug discovery effort. The number of adjustable parameters needed for DNNs is fairly large, but our results show that it is not necessary to optimize them for individual data sets, and a single set of recommended parameters can achieve better performance than RF for most of the data sets we studied. The usefulness of the parameters is demonstrated on additional data sets not used in the calibration. Although training DNNs is still computationally intensive, using graphical processing units (GPUs) can make this issue manageable.
In QSAR, a statistical model is generated from a training set of molecules (represented by chemical descriptors) and their biological activities. We will call this traditional type of QSAR model an ...“activity model”. The activity model can be used to predict the activities of molecules not in the training set. A relatively new subfield for QSAR is domain applicability. The aim is to estimate the reliability of prediction of a specific molecule on a specific activity model. A number of different metrics have been proposed in the literature for this purpose. It is desirable to build a quantitative model of reliability against one or more of these metrics. We can call this an “error model”. A previous publication from our laboratory (Sheridan J. Chem. Inf. Model., 2012, 52, 814–823.) suggested the simultaneous use of three metrics would be more discriminating than any one metric. An error model could be built in the form of a three-dimensional set of bins. When the number of metrics exceeds three, however, the bin paradigm is not practical. An obvious solution for constructing an error model using multiple metrics is to use a QSAR method, in our case random forest. In this paper we demonstrate the usefulness of this paradigm, specifically for determining whether a useful error model can be built and which metrics are most useful for a given problem. For the ten data sets and for the seven metrics we examine here, it appears that it is possible to construct a useful error model using only two metrics (TREE_SD and PREDICTED). These do not require calculating similarities/distances between the molecules being predicted and the molecules used to build the activity model, which can be rate-limiting.
In the pharmaceutical industry it is common to generate many QSAR models from training sets containing a large number of molecules and a large number of descriptors. The best QSAR methods are those ...that can generate the most accurate predictions but that are not overly expensive computationally. In this paper we compare eXtreme Gradient Boosting (XGBoost) to random forest and single-task deep neural nets on 30 in-house data sets. While XGBoost has many adjustable parameters, we can define a set of standard parameters at which XGBoost makes predictions, on the average, better than those of random forest and almost as good as those of deep neural nets. The biggest strength of XGBoost is its speed. Whereas efficient use of random forest requires generating each tree in parallel on a cluster, and deep neural nets are usually run on GPUs, XGBoost can be run on a single CPU in less than a third of the wall-clock time of either of the other methods.
One popular metric for estimating the accuracy of prospective quantitative structure–activity relationship (QSAR) predictions is based on the similarity of the compound being predicted to compounds ...in the training set from which the QSAR model was built. More recent work in the field has indicated that other parameters might be equally or more important than similarity. Here we make use of two additional parameters: the variation of prediction among random forest trees (less variation among trees indicates more accurate prediction) and the prediction itself (certain ranges of activity are intrinsically easier to predict than others). The accuracy of prediction for a QSAR model, as measured by the root-mean-square error, can be estimated by cross-validation on the training set at the time of model-building and stored as a three-dimensional array of bins. This is an obvious extension of the one-dimensional array of bins we previously proposed for similarity to the training set Sheridan et al. J. Chem. Inf. Comput. Sci. 2004, 44, 1912–1928. We show that using these three parameters simultaneously adds much more discrimination in prediction accuracy than any single parameter. This approach can be applied to any QSAR method that produces an ensemble of models. We also show that the root-mean-square errors produced by cross-validation are predictive of root-mean-square errors of compounds tested after the model was built.
Safety and Efficacy of NVX-CoV2373 Covid-19 Vaccine Heath, Paul T; Galiza, Eva P; Baxter, David N ...
New England journal of medicine/The New England journal of medicine,
09/2021, Volume:
385, Issue:
13
Journal Article
Peer reviewed
Open access
Early clinical data from studies of the NVX-CoV2373 vaccine (Novavax), a recombinant nanoparticle vaccine against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that contains the ...full-length spike glycoprotein of the prototype strain plus Matrix-M adjuvant, showed that the vaccine was safe and associated with a robust immune response in healthy adult participants. Additional data were needed regarding the efficacy, immunogenicity, and safety of this vaccine in a larger population.
In this phase 3, randomized, observer-blinded, placebo-controlled trial conducted at 33 sites in the United Kingdom, we assigned adults between the ages of 18 and 84 years in a 1:1 ratio to receive two intramuscular 5-μg doses of NVX-CoV2373 or placebo administered 21 days apart. The primary efficacy end point was virologically confirmed mild, moderate, or severe SARS-CoV-2 infection with an onset at least 7 days after the second injection in participants who were serologically negative at baseline.
A total of 15,187 participants underwent randomization, and 14,039 were included in the per-protocol efficacy population. Of the participants, 27.9% were 65 years of age or older, and 44.6% had coexisting illnesses. Infections were reported in 10 participants in the vaccine group and in 96 in the placebo group, with a symptom onset of at least 7 days after the second injection, for a vaccine efficacy of 89.7% (95% confidence interval CI, 80.2 to 94.6). No hospitalizations or deaths were reported among the 10 cases in the vaccine group. Five cases of severe infection were reported, all of which were in the placebo group. A post hoc analysis showed an efficacy of 86.3% (95% CI, 71.3 to 93.5) against the B.1.1.7 (or alpha) variant and 96.4% (95% CI, 73.8 to 99.5) against non-B.1.1.7 variants. Reactogenicity was generally mild and transient. The incidence of serious adverse events was low and similar in the two groups.
A two-dose regimen of the NVX-CoV2373 vaccine administered to adult participants conferred 89.7% protection against SARS-CoV-2 infection and showed high efficacy against the B.1.1.7 variant. (Funded by Novavax; EudraCT number, 2020-004123-16.).
Obesity is an independent risk factor for morbidity and mortality from pandemic influenza H1N1. Influenza is a significant public health threat, killing an estimated 250,000-500,000 people worldwide ...each year. More than one in ten of the world's adult population is obese and more than two-thirds of the US adult population is overweight or obese. No studies have compared humoral or cellular immune responses to influenza vaccination in healthy weight, overweight and obese populations despite clear public health importance.
The study employed a convenience sample to determine the antibody response to the 2009-2010 inactivated trivalent influenza vaccine (TIV) in healthy weight, overweight and obese participants at 1 and 12 months post vaccination. In addition, activation of CD8⁺ T cells and expression of interferon-γ and granzyme B were measured in influenza-stimulated peripheral blood mononuclear cell (PBMC) cultures.
Body mass index (BMI) correlated positively with higher initial fold increase in IgG antibodies detected by enzyme-linked immunosorbent assay to TIV, confirmed by HAI antibody in a subset study. However, 12 months post vaccination, higher BMI was associated with a greater decline in influenza antibody titers. PBMCs challenged ex vivo with vaccine strain virus, demonstrated that obese individuals had decreased CD8⁺ T-cell activation and decreased expression of functional proteins compared with healthy weight individuals.
These results suggest obesity may impair the ability to mount a protective immune response to influenza virus.
•Ethnic music (e.g., Chinese, Indian) increased the recall of menu items from the same country.•Ethnic music increased the likelihood of choosing menu items from the same country.•Classical music ...increased willingness to pay for products related to social identity.•Country music increased willingness to pay for utilitarian products.
Music congruity effects on consumer behavior are conceptualized in terms of cognitive priming of semantic networks in memory, and operationalized as congruent with a product's country of origin (Experiment 1), or congruent with the utilitarian (Experiment 2) or social identity (Experiments 2 and 3) connotations of a product. Hearing a specific genre of music (e.g., classical) activates related concepts in memory (e.g., expensive, sophisticated, formal, educated), which influences the memory for, perception of, and choice of products. Consistent with this account of music congruity effects, three laboratory experiments show that playing music of a specific genre during initial product exposure improved subsequent recall of conceptually related (i.e., congruent) products compared to unrelated products (Experiment 1), affected product choice in favor of congruent products (Experiment 1), and affected how much participants were willing to pay for congruent products (Experiments 2 and 3).