Abstract
One of the most exciting tools that have entered the material science toolbox in recent years is machine learning. This collection of statistical methods has already proved to be capable of ...considerably speeding up both fundamental and applied research. At present, we are witnessing an explosion of works that develop and apply machine learning to solid-state systems. We provide a comprehensive overview and analysis of the most recent research in this topic. As a starting point, we introduce machine learning principles, algorithms, descriptors, and databases in materials science. We continue with the description of different machine learning approaches for the discovery of stable materials and the prediction of their crystal structure. Then we discuss research in numerous quantitative structure–property relationships and various approaches for the replacement of first-principle methods by machine learning. We review how active learning and surrogate-based optimization can be applied to improve the rational design process and related examples of applications. Two major questions are always the interpretability of and the physical understanding gained from machine learning models. We consider therefore the different facets of interpretability and their importance in materials science. Finally, we propose solutions and future research paths for various challenges in computational materials science.
We developed new methods for parameter estimation-in-context and, with the help of 125 authors, built the AmP (Add-my-Pet) database of Dynamic Energy Budget (DEB) models, parameters and referenced ...underlying data for animals, where each species constitutes one database entry. The combination of DEB parameters covers all aspects of energetics throughout the full organism's life cycle, from the start of embryo development to death by aging. The species-specific parameter values capture biodiversity and can now, for the first time, be compared between animals species. An important insight brought by the AmP project is the classification of animal energetics according to a family of related DEB models that is structured on the basis of the mode of metabolic acceleration, which links up with the development of larval stages. We discuss the evolution of metabolism in this context, among animals in general, and ray-finned fish, mollusks and crustaceans in particular. New DEBtool code for estimating DEB parameters from data has been written. AmPtool code for analyzing patterns in parameter values has also been created. A new web-interface supports multiple ways to visualize data, parameters, and implied properties from the entire collection as well as on an entry by entry basis. The DEB models proved to fit data well, the median relative error is only 0.07, for the 1035 animal species at 2018/03/12, including some extinct ones, from all large phyla and all chordate orders, spanning a range of body masses of 16 orders of magnitude. This study is a first step to include evolutionary aspects into parameter estimation, allowing to infer properties of species for which very little is known.
We train a neural network as the universal exchange–correlation functional of density-functional theory that simultaneously reproduces both the exact exchange–correlation energy and the potential. ...This functional is extremely nonlocal but retains the computational scaling of traditional local or semilocal approximations. It therefore holds the promise of solving some of the delocalization problems that plague density-functional theory, while maintaining the computational efficiency that characterizes the Kohn–Sham equations. Furthermore, by using automatic differentiation, a capability present in modern machine-learning frameworks, we impose the exact mathematical relation between the exchange–correlation energy and the potential, leading to a fully consistent method. We demonstrate the feasibility of our approach by looking at one-dimensional systems with two strongly correlated electrons, where density-functional methods are known to fail, and investigate the behavior and performance of our functional by varying the degree of nonlocality.
We present a practical procedure to obtain reliable and unbiased neural network based force fields for solids. Training and test sets are efficiently generated from global structural prediction runs, ...at the same time assuring the structural variety and importance of sampling the relevant regions of phase space. The neural networks are trained to yield not only good formation energies, but also accurate forces and stresses, which are the quantities of interest for molecular dynamics simulations. Finally, we construct, as an example, several force fields for both semiconducting and metallic elements, and prove their accuracy for a variety of structural and dynamical properties. These are then used to study the melting of bulk copper and gold.
We examine various integration schemes for the time-dependent Kohn–Sham equations. Contrary to the time-dependent Schrödinger’s equation, this set of equations is nonlinear, due to the dependence of ...the Hamiltonian on the electronic density. We discuss some of their exact properties, and in particular their symplectic structure. Four different families of propagators are considered, specifically the linear multistep, Runge–Kutta, exponential Runge–Kutta, and the commutator-free Magnus schemes. These have been chosen because they have been largely ignored in the past for time-dependent electronic structure calculations. The performance is analyzed in terms of cost-versus-accuracy. The clear winner, in terms of robustness, simplicity, and efficiency is a simplified version of a fourth-order commutator-free Magnus integrator. However, in some specific cases, other propagators, such as some implicit versions of the multistep methods, may be useful.
The GW approximation is nowadays being used to obtain accurate quasiparticle energies of atoms and molecules. In practice, the GW approximation is generally evaluated perturbatively, based on a prior ...self-consistent calculation within a simpler approximation. The final result thus depends on the choice of the self-consistent mean-field chosen as a starting point. Using a recently developed GW code based on Gaussian basis functions, we benchmark a wide range of starting points for perturbative GW, including Hartree–Fock, LDA, PBE, PBE0, B3LYP, HSE06, BH&HLYP, CAM-B3LYP, and tuned CAM-B3LYP. In the evaluation of the ionization energy, the hybrid functionals are clearly superior results starting points when compared to Hartree–Fock, to LDA, or to the semilocal approximations. Furthermore, among the hybrid functionals, the ones with the highest proportion of exact-exchange usually perform best. Finally, the reliability of the frozen-core approximation, that allows for a considerable speed-up of the calculations, is demonstrated.
We compile a large data set designed for the efficient benchmarking of exchange–correlation functionals for the calculation of electronic band gaps. The data set comprises information on the ...experimental structure and band gap of 472 nonmagnetic materials and includes a diverse group of covalent-, ionic-, and van der Waals-bonded solids. We used it to benchmark 12 functionals, ranging from standard local and semilocal functionals, passing through meta-generalized-gradient approximations, and several hybrids. We included both general purpose functionals, like the Perdew–Burke–Ernzerhof approximation, and functionals specifically crafted for the determination of band gaps. The comparison of experimental and theoretical band gaps shows that the modified Becke–Johnson is at the moment the best available density functional, closely followed by the Heyd–Scuseria–Ernzerhof screened hybrid from 2006 and the high-local-exchange generalized-gradient approximation.
We perform a large scale benchmark of machine learning methods for the prediction of the thermodynamic stability of solids. We start by constructing a data set that comprises density functional ...theory calculations of around 250000 cubic perovskite systems. This includes all possible perovskite and antiperovskite crystals that can be generated with elements from hydrogen to bismuth, excluding rare gases and lanthanides. Incidentally, these calculations already reveal a large number of systems (around 500) that are thermodynamically stable but that are not present in crystal structure databases. Moreover, some of these phases have unconventional compositions and define completely new families of perovskites. This data set is then used to train and test a series of machine learning algorithms to predict the energy distance to the convex hull of stability. In particular, we study the performance of ridge regression, random forests, extremely randomized trees (including adaptive boosting), and neural networks. We find that extremely randomized trees give the smallest mean absolute error of the distance to the convex hull (121 meV/atom) in the test set of 230000 perovskites, after being trained in 20000 samples. Surprisingly, the machine already works if we give it as sole input features the group and row in the periodic table of the three elements composing the perovskite. Moreover, we find that the prediction accuracy is not uniform across the periodic table, being worse for first-row elements and elements forming magnetic compounds. Our results suggest that machine learning can be used to speed up considerably (by at least a factor of 5) high-throughput DFT calculations, by restricting the space of relevant chemical compositions without degradation of the accuracy.
Real-space grids are a powerful alternative for the simulation of electronic systems. One of the main advantages of the approach is the flexibility and simplicity of working directly in real space ...where the different fields are discretized on a grid, combined with competitive numerical performance and great potential for parallelization. These properties constitute a great advantage at the time of implementing and testing new physical models. Based on our experience with the Octopus code, in this article we discuss how the real-space approach has allowed for the recent development of new ideas for the simulation of electronic systems. Among these applications are approaches to calculate response properties, modeling of photoemission, optimal control of quantum systems, simulation of plasmonic systems, and the exact solution of the Schrödinger equation for low-dimensionality systems.
We explore how strategies to simulate various phenomena of electronic systems have been implemented in the Octopus code, using the versatility and performance of real-space grids.
Some of the most spectacular failures of density-functional and Hartree-Fock theories are related to an incorrect description of the so-called static electron correlation. Motivated by recent ...progress in the
N
-representability problem of the one-body density matrix for pure states, we propose a method to quantify the static contribution to the electronic correlation. By studying several molecular systems we show that our proposal correlates well with our intuition of static and dynamic electron correlation. Our results bring out the paramount importance of the occupancy of the highest occupied natural spin-orbital in such quantification.
Some of the most spectacular failures of density-functional and Hartree-Fock theories are related to an incorrect description of the so-called static electron correlation. Motivated by recent progress in the
N
-representability problem of the one-body density matrix for pure states, we propose a way to quantify the static contribution to the electronic correlation.