UNI-MB - logo
UMNIK - logo
 
E-viri
Celotno besedilo
Recenzirano Odprti dostop
  • Explainable Supervised Mach...
    Ferraz-Caetano, José; Teixeira, Filipe; Cordeiro, M. Natália D. S.

    Journal of chemical information and modeling, 04/2024, Letnik: 64, Številka: 7
    Journal Article

    Many challenges persist in developing accurate computational models for predicting solvation free energy (ΔG sol). Despite recent developments in Machine Learning (ML) methodologies that outperformed traditional quantum mechanical models, several issues remain concerning explanatory insights for broad chemical predictions with an acceptable speed–accuracy trade-off. To overcome this, we present a novel supervised ML model to predict the ΔG sol for an array of solvent–solute pairs. Using two different ensemble regressor algorithms, we made fast and accurate property predictions using open-source chemical features, encoding complex electronic, structural, and surface area descriptors for every solvent and solute. By integrating molecular properties and chemical interaction features, we have analyzed individual descriptor importance and optimized our model though explanatory information form feature groups. On aqueous and organic solvent databases, ML models revealed the predictive relevance of solutes with increasing polar surface area and decreasing polarizability, yielding better results than state-of-the-art benchmark Neural Network methods (without complex quantum mechanical or molecular dynamic simulations). Both algorithms successfully outperformed previous ΔG sol predictions methods, with a maximum absolute error of 0.22 ± 0.02 kcal mol–1, further validated in an external benchmark database and with solvent hold-out tests. With these explanatory and statistical insights, they allow a thoughtful application of this method for predicting other thermodynamic properties, stressing the relevance of ML modeling for further complex computational chemistry problems.