Selecting a “best” model among several competing candidate models poses an often encountered problem in water resources modeling (and other disciplines which employ models). For a modeler, the best ...model fulfills a certain purpose best (e.g., flood prediction), which is typically assessed by comparing model simulations to data (e.g., stream flow). Model selection methods find the “best” trade‐off between good fit with data and model complexity. In this context, the interpretations of model complexity implied by different model selection methods are crucial, because they represent different underlying goals of modeling. Over the last decades, numerous model selection criteria have been proposed, but modelers who primarily want to apply a model selection criterion often face a lack of guidance for choosing the right criterion that matches their goal. We propose a classification scheme for model selection criteria that helps to find the right criterion for a specific goal, i.e., which employs the correct complexity interpretation. We identify four model selection classes which seek to achieve high predictive density, low predictive error, high model probability, or shortest compression of data. These goals can be achieved by following either nonconsistent or consistent model selection and by either incorporating a Bayesian parameter prior or not. We allocate commonly used criteria to these four classes, analyze how they represent model complexity and what this means for the model selection task. Finally, we provide guidance on choosing the right type of criteria for specific model selection tasks. (A quick guide through all key points is given at the end of the introduction.)
Key Points
Model selection criteria are often chosen arbitrarily; we offer a guiding classification system for commonly used criteria centered around their representation of model complexity
The classification considers underlying definitions of model complexity which encompass different foci on identifying versus approaching an underlying truth, conducted in an either Bayesian or non‐Bayesian way
Each model selection class pursues a specific goal; we outline which one is most suitable for a specific modeling task
We show a link between Bayesian inference and information theory that is useful for model selection, assessment of information entropy and experimental design. We align Bayesian model evidence (BME) ...with relative entropy and cross entropy in order to simplify computations using prior-based (Monte Carlo) or posterior-based (Markov chain Monte Carlo) BME estimates. On the one hand, we demonstrate how Bayesian model selection can profit from information theory to estimate BME values via posterior-based techniques. Hence, we use various assumptions including relations to several information criteria. On the other hand, we demonstrate how relative entropy can profit from BME to assess information entropy during Bayesian updating and to assess utility in Bayesian experimental design. Specifically, we emphasize that relative entropy can be computed avoiding unnecessary multidimensional integration from both prior and posterior-based sampling techniques. Prior-based computation does not require any assumptions, however posterior-based estimates require at least one assumption. We illustrate the performance of the discussed estimates of BME, information entropy and experiment utility using a transparent, non-linear example. The multivariate Gaussian posterior estimate includes least assumptions and shows the best performance for BME estimation, information entropy and experiment utility from posterior-based sampling.
Choosing between competing models lies at the heart of scientific work, and is a frequent motivation for experimentation. Optimal experimental design (OD) methods maximize the benefit of experiments ...towards a specified goal. We advance and demonstrate an OD approach to maximize the information gained towards model selection. We make use of so-called model choice indicators, which are random variables with an expected value equal to Bayesian model weights. Their uncertainty can be measured with Shannon entropy. Since the experimental data are still random variables in the planning phase of an experiment, we use mutual information (the expected reduction in Shannon entropy) to quantify the information gained from a proposed experimental design. For implementation, we use the Preposterior Data Impact Assessor framework (PreDIA), because it is free of the lower-order approximations of mutual information often found in the geosciences. In comparison to other studies in statistics, our framework is not restricted to sequential design or to discrete-valued data, and it can handle measurement errors. As an application example, we optimize an experiment about the transport of contaminants in clay, featuring the problem of choosing between competing isotherms to describe sorption. We compare the results of optimizing towards maximum model discrimination with an alternative OD approach that minimizes the overall predictive uncertainty under model choice uncertainty.
Gaussian process emulators (GPE) are a machine learning approach that replicates computational demanding models using training runs of that model. Constructing such a surrogate is very challenging ...and, in the context of Bayesian inference, the training runs should be well invested. The current paper offers a fully Bayesian view on GPEs for Bayesian inference accompanied by Bayesian active learning (BAL). We introduce three BAL strategies that adaptively identify training sets for the GPE using information-theoretic arguments. The first strategy relies on Bayesian model evidence that indicates the GPE’s quality of matching the measurement data, the second strategy is based on relative entropy that indicates the relative information gain for the GPE, and the third is founded on information entropy that indicates the missing information in the GPE. We illustrate the performance of our three strategies using analytical- and carbon-dioxide benchmarks. The paper shows evidence of convergence against a reference solution and demonstrates quantification of post-calibration uncertainty by comparing the introduced three strategies. We conclude that Bayesian model evidence-based and relative entropy-based strategies outperform the entropy-based strategy because the latter can be misleading during the BAL. The relative entropy-based strategy demonstrates superior performance to the Bayesian model evidence-based strategy.
Thermochemical Energy Storage (TCES), specifically the calcium oxide (CaO)/calcium hydroxide (Ca(OH)2) system is a promising energy storage technology with relatively high energy density and low ...cost. However, the existing models available to predict the system’s internal states are computationally expensive. An accurate and real-time capable model is therefore still required to improve its operational control. In this work, we implement a Physics-Informed Neural Network (PINN) to predict the dynamics of the TCES internal state. Our proposed framework addresses three physical aspects to build the PINN: (1) we choose a Nonlinear Autoregressive Network with Exogeneous Inputs (NARX) with deeper recurrence to address the nonlinear latency; (2) we train the network in closed-loop to capture the long-term dynamics; and (3) we incorporate physical regularisation during its training, calculated based on discretized mole and energy balance equations. To train the network, we perform numerical simulations on an ensemble of system parameters to obtain synthetic data. Even though the suggested approach provides results with the error of 3.96×10−4 which is in the same range as the result without physical regularisation, it is superior compared to conventional Artificial Neural Network (ANN) strategies because it ensures physical plausibility of the predictions, even in a highly dynamic and nonlinear problem. Consequently, the suggested PINN can be further developed for more complicated analysis of the TCES system.
Anthropogenic Trace Compounds (ATCs) that continuously grow in numbers and concentrations are an emerging issue for water quality in both natural and technical environments. The complex web of ...exposure pathways as well as the variety in the chemical structure and potency of ATCs represents immense challenges for future research and policy initiatives. This review summarizes current trends and identifies knowledge gaps in innovative, effective monitoring and management strategies while addressing the research questions concerning ATC occurrence, fate, detection and toxicity.
We highlight the progressing sensitivity of chemical analytics and the challenges in harmonization of sampling protocols and methods, as well as the need for ATC indicator substances to enable cross-national valid monitoring routine. Secondly, the status quo in ecotoxicology is described to advocate for a better implementation of long-term tests, to address toxicity on community and environmental as well as on human-health levels, and to adapt various test levels and endpoints. Moreover, we discuss potential sources of ATCs and the current removal efficiency of wastewater treatment plants (WWTPs) to indicate the most effective places and elimination strategies. Knowledge gaps in transport and/or detainment of ATCs through their passage in surface waters and groundwaters are further emphasized in relation to their physico-chemical properties, abiotic conditions and biological interactions in order to highlight fundamental research needs. Finally, we demonstrate the importance and remaining challenges of an appropriate ATC risk assessment since this will greatly assist in identifying the most urgent calls for action, in selecting the most promising measures, and in evaluating the success of implemented management strategies.
•ATCs in aquatic systems call for new multidisciplinary large-scale strategies.•Indicator substances are needed to characterise ATC exposure pathways and sinks.•Bioassays must provide information on various toxicity levels and chronic effects.•Biofilms play an essential role in ATC dynamics and attenuation in the environment.•Risk assessment assists in the search of new avoidance and elimination measures.
The finite volume neural network (FINN) is an exception among recent physics-aware neural network models as it allows the specification of arbitrary boundary conditions (BCs). FINN can generalize and ...adapt to various prescribed BC values not provided during training, where other models fail. However, FINN depends explicitly on given BC values and cannot deal with unobserved parts within the physical domain. To overcome these limitations, we extend FINN in two ways. First, we integrate the capability to infer BC values on-the-fly from just a few data points. This allows us to apply FINN in situations, where the BC values, such as the inflow rate of fluid into a simulated medium, are unknown. Second, we extend FINN to plausibly reconstruct missing data within the physical domain via a gradient-driven spin-up phase. Our experiments validate that FINN reliably infers correct BCs, but also generates smooth and plausible full-domain reconstructions that are consistent with the observable data. Moreover, FINN can generate precise predictions orders of magnitude more accurate compared to competitive pure ML and physics-aware ML models - even when the physical domain is only partially visible, and the BCs are applied at a point that is spatially distant from the observable volumes.
Within-season crop yield forecasting at national and regional levels is crucial to ensure food security. Yet, forecasting is a challenge because of incomplete knowledge about the heterogeneity of ...factors determining crop growth, above all management and cultivars. This motivates us to propose a method for early forecasting of winter wheat yields in low-information systems regarding crop management and cultivars, and uncertain weather condition. The study was performed in two contrasting regions in southwest Germany, Kraichgau and Swabian Jura. We used in-season green leaf area index (LAI) as a proxy for end-of-season grain yield. We applied PILOTE, a simple and computationally inexpensive semi-empirical radiative transfer model to produce yield forecasts and assimilated LAI data measured in-situ and sensed by satellites (Landsat and Sentinel-2). To assimilate the LAI data into the PILOTE model, we used the particle filtering method. Both weather and sowing data were treated as random variables, acknowledging principal sources of uncertainties to yield forecasting. As such, we used the stochastic weather generator MarkSim® GCM to produce an ensemble of uncertain meteorological boundary conditions until the end of the season. Sowing dates were assumed normally distributed. To evaluate the performance of the data assimilation scheme, we set up the PILOTE model without data assimilation, treating weather data and sowing dates as random variables (baseline Monte Carlo simulation). Data assimilation increased the accuracy and precision of LAI simulation. Increasing the number of assimilation times decreased the mean absolute error (MAE) of LAI prediction from satellite data by ~1 to 0.2 m2/m2. Yield prediction was improved by data assimilation as compared to the baseline Monte Carlo simulation in both regions. Yield prediction by assimilating satellite-derived LAI showed similar statistics as assimilating the LAI data measured in-situ. The error in yield prediction by assimilating satellite-derived LAI was 7% in Kraichgau and 4% in Swabian Jura, whereas the yield prediction error by Monte Carlo simulation was 10 percent in both regions. Overall, we conclude that assimilating even noisy LAI data before anthesis substantially improves forecasting of winter wheat grain yield by reducing prediction errors caused by uncertainties in weather data, incomplete knowledge about management, and model calibration uncertainty.
Scalar mixing plays a significant role for transport in geophysical flows because it controls dilution and is a main driver for many chemical reactions. Here we study the local scale flow mechanisms ...that lead to enhanced scalar mixing, and how they impact on the global mixing behavior. Mixing is quantified in terms of the entropy of the scalar distribution. It is shown that the evolution of entropy is directly linked to the flow topology in terms of the Okubo‐Weiss parameter Θ. Dominant shear and stretching deformation (Θ > 0) leads to a strong increase of local mixing strength, while dominant vorticity (Θ < 0) has only a minor impact. This allows to delineate regions of increased scalar mixing potential by mapping out the spatial distribution of Θ(x), and to relate global scalar mixing to an areal averaged effective Okubo‐Weiss measure.
Key Points
Explicit quantification of the local scale mixing mechanisms
Identification of the relation between entropy growth and flow metric
Quantification of global mixing using a novel global flow metric
Model averaging makes it possible to use multiple models for one modelling task, like predicting a certain quantity of interest. Several Bayesian approaches exist that all yield a weighted average of ...predictive distributions. However, often, they are not properly applied which can lead to false conclusions. In this study, we focus on Bayesian Model Selection (BMS) and Averaging (BMA), Pseudo-BMS/BMA and Bayesian Stacking. We want to foster their proper use by, first, clarifying their theoretical background and, second, contrasting their behaviours in an applied groundwater modelling task. We show that only Bayesian Stacking has the goal of model averaging for improved predictions by model combination. The other approaches pursue the quest of finding a single best model as the ultimate goal, and use model averaging only as a preliminary stage to prevent rash model choice. Improved predictions are thereby not guaranteed. In accordance with so-called M -settings that clarify the alleged relations between models and truth, we elicit which method is most promising.