We present a first-principles-based formalism to provide a quantitative measure of the thermodynamic instability and propensity for electrochemical stabilization, passivation, or corrosion of ...metastable materials in aqueous media. We demonstrate that this formalism can assess the relative Gibbs free energy of candidate materials in aqueous media as well as their decomposition products, combining solid and aqueous phases, as a function of pH and potential. On the basis of benchmarking against 20 stable as well as metastable materials reported in the literature and also our experimental characterization of metastable triclinic-FeVO4, we present quantitative estimates for the relative Gibbs free energy and corresponding aqueous regimes where these materials are most likely to be stable, form inert passivating films, or steadily corrode to aqueous species. Furthermore, we show that the structure and composition of the passivating films formed on triclinic-FeVO4 are also in excellent agreement with the Point Defect Model, as proposed by the corrosion community. An open-source web application based on the formalism is made available at https://materialsproject.org.
High-throughput experimentation provides efficient mapping of composition–property relationships, and its implementation for the discovery of optical materials enables advancements in solar energy ...and other technologies. In a high throughput pipeline, automated data processing algorithms are often required to match experimental throughput, and we present an automated Tauc analysis algorithm for estimating band gap energies from optical spectroscopy data. The algorithm mimics the judgment of an expert scientist, which is demonstrated through its application to a variety of high throughput spectroscopy data, including the identification of indirect or direct band gaps in Fe2O3, Cu2V2O7, and BiVO4. The applicability of the algorithm to estimate a range of band gap energies for various materials is demonstrated by a comparison of direct-allowed band gaps estimated by expert scientists and by automated algorithm for 60 optical spectra.
Sequential learning (SL) strategies,
i.e.
iteratively updating a machine learning model to guide experiments, have been proposed to significantly accelerate materials discovery and research. ...Applications on computational datasets and a handful of optimization experiments have demonstrated the promise of SL, motivating a quantitative evaluation of its ability to accelerate materials discovery, specifically in the case of physical experiments. The benchmarking effort in the present work quantifies the performance of SL algorithms with respect to a breadth of research goals: discovery of any "good" material, discovery of all "good" materials, and discovery of a model that accurately predicts the performance of new materials. To benchmark the effectiveness of different machine learning models against these goals, we use datasets in which the performance of all materials in the search space is known from high-throughput synthesis and electrochemistry experiments. Each dataset contains all pseudo-quaternary metal oxide combinations from a set of six elements (chemical space), the performance metric chosen is the electrocatalytic activity (overpotential) for the oxygen evolution reaction (OER). A diverse set of SL schemes is tested on four chemical spaces, each containing 2121 catalysts. The presented work suggests that research can be accelerated by up to a factor of 20 compared to random acquisition in specific scenarios. The results also show that certain choices of SL models are ill-suited for a given research goal resulting in substantial deceleration compared to random acquisition methods. The results provide quantitative guidance on how to tune an SL strategy for a given research goal and demonstrate the need for a new generation of materials-aware SL algorithms to further accelerate materials discovery.
Benchmarking metrics for materials discovery
via
sequential learning are presented, to assess the efficacy of existing algorithms and to be scientific in our assessment of accelerated science.
The limited number of known low-band-gap photoelectrocatalytic materials poses a significant challenge for the generation of chemical fuels from sunlight. Using high-throughput ab initio theory with ...experiments in an integrated workflow, we find eight ternary vanadate oxide photoanodes in the target band-gap range (1.2–2.8 eV). Detailed analysis of these vanadate compounds reveals the key role of VO₄ structural motifs and electronic band-edge character in efficient photoanodes, initiating a genome for such materials and paving the way for a broadly applicable high-throughput-discovery and materials-by-design feedback loop. Considerably expanding the number of known photoelectrocatalysts for water oxidation, our study establishes ternary metal vanadates as a prolific class of photoanode materials for generation of chemical fuels from sunlight and demonstrates our high-throughput theory–experiment pipeline as a prolific approach to materials discovery.
In this perspective, we highlight results of a research consortium devoted to advancing understanding of oxygen reduction reaction (ORR) catalysis as a means to inform fuel cell science. We ...demonstrate how targeted collaborations between different institutions from academic, national lab, and industry backgrounds and different scientific disciplines like theory, experiment, and characterization can yield unique insights into fuel cell catalysts. We comment on such insights into material designs for platinum-group-metal alloys, transition metal oxides, and non-traditional materials including metal-organic frameworks; systems that have served as the foundational building blocks for our consortium. We also motivate a renewed focus on catalyst durability in light of emerging technological requirements and paths forward in understanding
in situ
and
operando
electrochemical stability. Finally, we describe new frontiers ORR research can take and how emerging artificial intelligence tools can assist researchers in capturing data, selecting new experiments, and guiding characterization to accelerate the design and discovery of fuel cell catalysts. A main goal of sharing this perspective is to discuss the rationale for our future research plans based on our consortium work. However, we also hope to illustrate both the potential impact of a collaborative strategy with the hopes of inspiring a higher degree of Industry-Academia-National Laboratory collaboration and encourage other centers and consortiums to distill and share their findings in a similar perspective-type article. Together we hope to enable the fuel cell research community to engage in a discussion of strategies for research and accelerated development of catalysts with improved activity and stability.
In this perspective, we highlight results of a research consortium devoted to advancing understanding of oxygen reduction reaction (ORR) catalysis as a means to inform fuel cell science.
X-ray absorption spectroscopy (XAS) produces a wealth of information about the local structure of materials, but interpretation of spectra often relies on easily accessible trends and prior ...assumptions about the structure. Recently, researchers have demonstrated that machine learning models can automate this process to predict the coordinating environments of absorbing atoms from their XAS spectra. However, machine learning models are often difficult to interpret, making it challenging to determine when they are valid and whether they are consistent with physical theories. In this work, we present three main advances to the data-driven analysis of XAS spectra: we demonstrate the efficacy of random forests in solving two new property determination tasks (predicting Bader charge and mean nearest neighbor distance), we address how choices in data representation affect model interpretability and accuracy, and we show that multiscale featurization can elucidate the regions and trends in spectra that encode various local properties. The multiscale featurization transforms the spectrum into a vector of polynomial-fit features, and is contrasted with the commonly-used “pointwise” featurization that directly uses the entire spectrum as input. We find that across thousands of transition metal oxide spectra, the relative importance of features describing the curvature of the spectrum can be localized to individual energy ranges, and we can separate the importance of constant, linear, quadratic, and cubic trends, as well as the white line energy. This work has the potential to assist rigorous theoretical interpretations, expedite experimental data collection, and automate analysis of XAS spectra, thus accelerating the discovery of new functional materials.
Sequential learning for materials discovery is a paradigm where a computational agent solicits new data to simultaneously update a model in service of exploration (finding the largest number of ...materials that meet some criteria) or exploitation (finding materials with an ideal figure of merit). In real-world discovery campaigns, new data acquisition may be costly and an optimal strategy may involve using and acquiring data with different levels of fidelity, such as first-principles calculation to supplement an experiment. In this work, we introduce agents which can operate on multiple data fidelities, and benchmark their performance on an emulated discovery campaign to find materials with desired band gap values. The fidelities of data come from the results of DFT calculations as low fidelity and experimental results as high fidelity. We demonstrate performance gains of agents which incorporate multi-fidelity data in two contexts: either using a large body of low fidelity data as a prior knowledge base or acquiring low fidelity data in-tandem with experimental data. This advance provides a tool that enables materials scientists to test various acquisition and model hyperparameters to maximize the discovery rate of their own multi-fidelity sequential learning campaigns for materials discovery. This may also serve as a reference point for those who are interested in practical strategies that can be used when multiple data sources are available for active or sequential learning campaigns.
In materials discovery efforts, synthetic capabilities far outpace the ability to extract meaningful data from them. To bridge this gap, machine learning methods are necessary to reduce the search ...space for identifying desired materials. Here, we present a machine learning–driven, closed-loop experimental process to guide the synthesis of polyelemental nanomaterials with targeted structural properties. By leveraging data from an eight-dimensional chemical space (Au-Ag-Cu-Co-Ni-Pd-Sn-Pt) as inputs, a Bayesian optimization algorithm is used to suggest previously unidentified nanoparticle compositions that target specific interfacial motifs for synthesis, results of which are iteratively shared back with the algorithm. This feedback loop resulted in successful syntheses of 18 heterojunction nanomaterials that are too complex to discover by chemical intuition alone, including extremely chemically complex biphasic nanoparticles reported to date. Platforms like the one developed here are poised to transform materials discovery across a wide swath of applications and industries.
Rapid construction of phase diagrams is a central tenet of combinatorial materials science with accelerated materials discovery efforts often hampered by challenges in interpreting combinatorial ...X-ray diffraction data sets, which we address by developing AgileFD, an artificial intelligence algorithm that enables rapid phase mapping from a combinatorial library of X-ray diffraction patterns. AgileFD models alloying-based peak shifting through a novel expansion of convolutional nonnegative matrix factorization, which not only improves the identification of constituent phases but also maps their concentration and lattice parameter as a function of composition. By incorporating Gibbs’ phase rule into the algorithm, physically meaningful phase maps are obtained with unsupervised operation, and more refined solutions are attained by injecting expert knowledge of the system. The algorithm is demonstrated through investigation of the V–Mn–Nb oxide system where decomposition of eight oxide phases, including two with substantial alloying, provides the first phase map for this pseudoternary system. This phase map enables interpretation of high-throughput band gap data, leading to the discovery of new solar light absorbers and the alloying-based tuning of the direct-allowed band gap energy of MnV2O6. The open-source family of AgileFD algorithms can be implemented into a broad range of high throughput workflows to accelerate materials discovery.
High-throughput experimental methodologies are capable of synthesizing, screening and characterizing vast arrays of combinatorial material libraries at a very rapid rate. These methodologies ...strategically employ tiered screening wherein the number of compositions screened decreases as the complexity, and very often the scientific information obtained from a screening experiment, increases. The algorithm used for down-selection of samples from higher throughput screening experiment to a lower throughput screening experiment is vital in achieving information-rich experimental materials genomes. The fundamental science of material discovery lies in the establishment of composition–structure–property relationships, motivating the development of advanced down-selection algorithms which consider the information value of the selected compositions, as opposed to simply selecting the best performing compositions from a high throughput experiment. Identification of property fields (composition regions with distinct composition-property relationships) in high throughput data enables down-selection algorithms to employ advanced selection strategies, such as the selection of representative compositions from each field or selection of compositions that span the composition space of the highest performing field. Such strategies would greatly enhance the generation of data-driven discoveries. We introduce an informatics-based clustering of composition-property functional relationships using a combination of information theory and multitree genetic programming concepts for identification of property fields in a composition library. We demonstrate our approach using a complex synthetic composition-property map for a 5 at. % step ternary library consisting of four distinct property fields and finally explore the application of this methodology for capturing relationships between composition and catalytic activity for the oxygen evolution reaction for 5429 catalyst compositions in a (Ni–Fe–Co–Ce)O x library.