•SVM model was established for toxicity to P. subcapitata.•Only 6 descriptors were adopted in the model for 334 toxicants.•The test set has a large data set of 167 compounds.•The model is accurate ...and satisfactory in predicting pEC10.
Predicting the toxicity of organic toxicants to aquatic life through chemometric approach is challenging area. In this paper, a six-descriptor quantitative structure–activity/toxicity relationship (QSAR/QSTR) model was successfully developed for the toxicity pEC10 of organic chemicals against Pseudokirchneriella subcapitata, by applying support vector machine (SVM) together with genetic algorithm. A sufficiently large data set consisting of 334 organic chemicals was randomly divided into a training set (167 compounds) and a test set (167 compounds) with a ratio of 1:1. The optimal SVM model possesses coefficient of determination R2 of 0.76 and mean absolute error (MAE) of 0.60 for the training set and R2 of 0.75 and MAE of 0.61 for the test set. Compared with other models reported in the literature, our SVM model for the toxicity pEC10 shows significant statistical quality and satisfactory predictive ability, although it has fewer molecular descriptors and more samples in the test set. A QSTR model for pEC50 of organic chemicals against Pseudokirchneriella subcapitata was also developed with the same subsets and molecular descriptors.
Developing broad-spectrum anti-coronavirus drugs is greatly important, since the novel SARS-CoV-2 has rapidly become a threat to the public health and economy worldwide. SARS-CoV 3-chymotrypsin-like ...protease (3CLpro), as highly conserved in betacoronavirus, is a viable target for anti-SARS drugs. A quantitative structure–activity relationship (QSAR) for inhibitory constants (pKi) of 89 compounds against SARS-CoV 3CLpro enzyme was developed by using support vector machine (SVM) and genetic algorithm. The optimal SVM model (C = 90.2339 and γ = 1.19826 × 10−5) based on six molecular descriptors has determination coefficients of 0.839 for the training set (65 compounds) and 0.747 for test set (24 compounds), and rms errors of 0.435 and 0.525, respectively. These results are accurate and acceptable compared with that in other models reported, although our SVM model deals with more samples in the dada set. The SVM model could be beneficial for search of novel 3CLpro enzyme inhibitors against SARS-CoV.
The integral equation formalism polarizable continuum model (IEF-PCM) for solvent effects with the default solvent (water) and solvent parameters, together with the density functional theory method ...at 6-31G(d) level, was used to optimize molecular structures for polychlorinated biphenyl (PCB) congeners. Four molecular descriptors were selected to develop quantitative structure–activity relationship (QSAR) models for the depuration rate constants (k d) of 63 PCB congeners in a juvenile rainbow trout (Oncorhynchus mykiss). The optimal multiple linear regression (MLR) model has the correlation coefficient R of 0.933 and the root mean square (rms) error of 0.0681 for the total set of 63 PCB congeners. The support vector regression model has R of 0.953 and rms error of 0.0576 for the total set. Both the MLR and SVM QSAR models in this paper were accurate and acceptable compared with other QSAR models for the depuration rate of PCB congeners reported in references. Thus, applying IEF-PCM and B3LYP/6-31G(d) calculations for molecular descriptor derivation of PCB congeners is successful.
is widely used as the model species in toxicity and risk assessment. For the first time, a global classification model was proposed in this paper for a two-class problem (Class - 1 with log1/IBC
≤ ...4.2 and Class + 1 with log1/IBC
> 4.2, the unit of IBC
: mol/L) by utilizing a large data set of 601 toxicity log1/IBC
of organic compounds to
. Dragon software was used to calculate 4885 molecular descriptors for each compound. Stepwise multiple linear regression (MLR) analysis was used to select the descriptor subset for the models. The ten molecular descriptors used in the classification model reflect the structural information on the Michael-type addition of nucleophiles, molecular branching, molecular size, polarizability, hydrophobic, and so on. Furthermore, these descriptors were interpreted from the point of view of toxicity mechanisms. The optimal support vector machine (SVM) model (
= 253.8 and
= 0.009) was obtained with the genetic algorithm. The SVM classification model produced a prediction accuracy of 89.1% for the training set (451 log1/IBC
), of 80.0% for the test set (150 log1/IBC
), and of 86.9% for the total data set (601 log1/IBC
), which are higher than that (80.5%, 76%, and 79.4%, respectively) from the binary logistic regression (BLR) model. The global SVM classification model is successful, although it deals with a large data set in relation to the toxicity of organics to
.
A quantitative structure-toxicity relationship (QSTR) model based on four descriptors was successfully developed for 1163 chemical toxicants against Tetrahymena pyriformis by applying general ...regression neural network (GRNN). The training set consisting of 600 organic compounds was used to train GRNN models that were evaluated with the test set of 563 compounds. For the optimal GRNN model, the training set possesses the coefficient of determination R2 of 0.86 and root mean square (rms) error of 0.41, and the test set has R2 of 0.80 and rms of 0.41. Investigated results indicate that the optimal GRNN model is accurate, although the GRNN model has only four descriptor and more samples in the test set.
Display omitted
•MLR and GRNN models were established for toxicity to Tetrahymena pyriformis.•Only four molecular descriptors were used in models of 1163 toxicants.•The GRNN model is more accurate than the MLR one.•The GRNN model shows satisfactory prediction performance for pIGC50.
Molecular descriptors reflecting structural information on hydrophobicity, reactivity, polarizability, hydrogen bond and charged groups, were used to predict the toxicity (pLC50) of chemicals towards ...Daphnia magna with global quantitative structure–activity/toxicity relationship (QSAR/QSTR) models. A sufficiently large dataset including 1517 chemical toxicity to Daphnia magna was divided into a training set (758 pLC50) and a test set (759 pLC50). By applying random forest algorithm, two classification models, Class Model A and Class Model B were developed, having prediction accuracy, sensitivity and specificity above 85% for Class 1 (with pLC50 ≤ 4.48) and Class 2 (with pLC50 > 4.48). The Class Model A was based on nine molecular descriptors and RF parameters of nodesize = 1, ntree = 80 and mtry = 2, and yielded accuracy of 92.3% (training set), 85.6% (test set) and 88.9% (total data set). Class Model B was based on ten descriptors and parameters, nodesize = 1, ntree = 90 and mtry = 2, produced accuracy of 88.3% (training set), 86.8% (test set) and 87.5% (total data set). The two classification models were satisfactory compared with other classification model reported in the literature, although classification models in this work dealt with more samples. Thus, the two classification models with a larger applicability domain provided efficient tools for assessing chemical aquatic toxicity towards Daphnia magna.
•Increasing molecular hydrophobicity causes high toxicity pLC50 to Daphnia magna.•Compounds with charged groups have strong polarity and lower toxicity pLC50.•Introducing phosphorus-sulfur double bonds increases molecular reactivity and toxicity.•These molecules prone to forming hydrogen bonds have lower toxicity pLC50.
A three-descriptor quantitative structure-activity/toxicity relationship (QSAR/QSTR) model was developed for the skin permeability of a sufficiently large data set consisting of 274 compounds, by ...applying support vector machine (SVM) together with genetic algorithm. The optimal SVM model possesses the coefficient of determination R
of 0.946 and root mean square (rms) error of 0.253 for the training set of 139 compounds; and a R
of 0.872 and rms of 0.302 for the test set of 135 compounds. Compared with other models reported in the literature, our SVM model shows better statistical performance in a model that deals with more samples in the test set. Therefore, applying a SVM algorithm to develop a nonlinear QSAR model for skin permeability was achieved.
Predicting the toxicity of chemicals to various fish species through chemometric approach is crucial for ecotoxicological assessment of existing as well as not yet synthesized chemicals. This paper ...reports a quantitative structure–activity/toxicity relationship (QSAR/QSTR) model for the toxicity pLC50 of organic chemicals against various fish species. Only six descriptors were used to develop the QSTR model, by applying support vector machine (SVM) together with genetic algorithm. The QSTR model was trained and established on a sufficiently large data set of 840 organic compounds and evaluated with a test set (281 compounds). Compared with other QSTRs reported in the literature, the optimal SVM model for fish toxicity produces better statistical results with determination coefficients R2 above 0.70 for both the training set and test set, although the QSTR model in this work possesses fewer molecular descriptors. Applying SVM and genetic algorithm to develop the QSTR model for pLC50 of organic compounds against various fish species is successful.
•A SVM model was established for toxicity to various fish species.•Only six descriptors were used in the models for 1121 toxicants.•The optimal SVM model is more accurate than the MLR model.•The optimal SVM model produces satisfactory prediction results.
A novel molecular device (trans-azobenzene embedded N-(11-pyrenyl methyl)aza-21-crown-7) with double functional devices was designed on the basis of theoretical calculations. Pyrenyl methyl ...covalently bonded to aza-21-crown-7 at the nitrogen position interacting with a series of alkaline-earth metal cations (Mg2+, Ca2+, Sr2+, and Ba2+) was investigated. The fully optimized geometries and real frequency calculations were investigated using a computational strategy based on density functional theory at B3LYP/6-31G(d) level. Free ligand (L) and their metal cation complexes (L/M2+) were studied using mixed basis set (6-31G(d) for the atoms C, H, O, and N and LANL2DZ for alkaline-earth metal cations Mg2+, Ca2+, Sr2+, and Ba2+. The natural bond orbital analysis that is based on optimized geometric structures was used to explore the interaction of L/M2+ molecules. The absorption spectra of L and L/M2+, excitation energies, and absorption wavelength for their excited states were studied by time-dependent density functional theory with 6-31G(d) and LANL2DZ. A new type of molecular device is found, which has the selectivity to Ca2+ and the emission fluorescence of L/Ca2+ under the condition of illumination. This molecular device would serve as an allosteric switch and a fluorescence chemosensor.
•QSAR was built for toxicity data of pesticides to various fish species.•Random forest was used to establish the classification model of LC50.•Only 8 descriptors were used in the models for 1106 ...organic pesticides.•The total data set has prediction accuracy above 96%.
Aquatic toxicity of pesticides can result in poisoning of many non-target organisms, of which various fishes are the most prominent one. It is a challenge to predict the toxicity (LC50) classes of organic pesticides to various fish species from global QSAR models with a larger applicability domain. In this paper, by applying the random forest (RF) algorithm for a two-class problem, only eight molecular descriptors were used to develop a quantitative structure–activity relationship (QSAR) model for 1106 toxicity data (96 h, LC50) of organic pesticides to various fish species including Oncorhynchus mykiss, Lepomis macrochirus, Pimephales promelas, Brachydanio rerio, Cyprinodon, Cyprinus carpio, etc. By the prediction of the optimal RF Model I (ntree =280, mtry = 3 and nodesize = 5), the training set (885 organic pesticides) has the prediction accuracies of 99.6% for Class 1 (LC50 ≤ 10) and 96.7% for Class 2 (LC50 > 10); the test set (221 organic pesticides) has the accuracies being 90.8% for Class 1 and 91.2% for Class 2. The optimal RF Model I is satisfactory compared with other QSAR model reported in the literature, although its descriptor subset is small.