Selecting a “best” model among several competing candidate models poses an often encountered problem in water resources modeling (and other disciplines which employ models). For a modeler, the best ...model fulfills a certain purpose best (e.g., flood prediction), which is typically assessed by comparing model simulations to data (e.g., stream flow). Model selection methods find the “best” trade‐off between good fit with data and model complexity. In this context, the interpretations of model complexity implied by different model selection methods are crucial, because they represent different underlying goals of modeling. Over the last decades, numerous model selection criteria have been proposed, but modelers who primarily want to apply a model selection criterion often face a lack of guidance for choosing the right criterion that matches their goal. We propose a classification scheme for model selection criteria that helps to find the right criterion for a specific goal, i.e., which employs the correct complexity interpretation. We identify four model selection classes which seek to achieve high predictive density, low predictive error, high model probability, or shortest compression of data. These goals can be achieved by following either nonconsistent or consistent model selection and by either incorporating a Bayesian parameter prior or not. We allocate commonly used criteria to these four classes, analyze how they represent model complexity and what this means for the model selection task. Finally, we provide guidance on choosing the right type of criteria for specific model selection tasks. (A quick guide through all key points is given at the end of the introduction.)
Key Points
Model selection criteria are often chosen arbitrarily; we offer a guiding classification system for commonly used criteria centered around their representation of model complexity
The classification considers underlying definitions of model complexity which encompass different foci on identifying versus approaching an underlying truth, conducted in an either Bayesian or non‐Bayesian way
Each model selection class pursues a specific goal; we outline which one is most suitable for a specific modeling task
Most studies in vadose zone hydrology use a single conceptual model for predictive inference and analysis. Focusing on the outcome of a single model is prone to statistical bias and underestimation ...of uncertainty. In this study, we combine multiobjective optimization and Bayesian model averaging (BMA) to generate forecast ensembles of soil hydraulic models. To illustrate our method, we use observed tensiometric pressure head data at three different depths in a layered vadose zone of volcanic origin in New Zealand. A set of seven different soil hydraulic models is calibrated using a multiobjective formulation with three different objective functions that each measure the mismatch between observed and predicted soil water pressure head at one specific depth. The Pareto solution space corresponding to these three objectives is estimated with AMALGAM and used to generate four different model ensembles. These ensembles are postprocessed with BMA and used for predictive analysis and uncertainty estimation. Our most important conclusions for the vadose zone under consideration are (1) the mean BMA forecast exhibits similar predictive capabilities as the best individual performing soil hydraulic model, (2) the size of the BMA uncertainty ranges increase with increasing depth and dryness in the soil profile, (3) the best performing ensemble corresponds to the compromise (or balanced) solution of the three-objective Pareto surface, and (4) the combined multiobjective optimization and BMA framework proposed in this paper is very useful to generate forecast ensembles of soil hydraulic models.
Performance criteria are used in the automated calibration of hydrological models to determine and minimise the misfit between observations and model simulations. In this study, a multiobjective ...model calibration framework is used to analyse the trade-offs between Nash–Sutcliffe efficiency of flows (NSE), the NSE of log-transformed flows (NSEₗₒgQ), and the sum-squared error of monthly discharge sums (SSEMQ). These criteria are known to put different emphasis on average and high flows, low flows, and average volume-balance components. Twenty-two upper Neckar subbasins whose catchment area ranges from 56 to 3,976 km² were modelled with the distributed mesoscale hydrological model (mHM) to investigate these trade-offs. The 53 global parameters required for each instance of the mHM model were estimated with the global search algorithm AMALGAM. Equally weighted compromise solutions based on the selected criteria and extreme ends of all bi-criterion Pareto fronts were used after each calibration run to analyse the trade-off between different performance criteria. Calibration results were further analysed with ten additional criteria commonly used for evaluating hydrological model performance. Results showed that the trade-off patterns were similar for all subbasins irrespective of catchment size and that the largest trade-offs were consistently observed between the NSE and NSEₗₒgQ criteria. Simulations with the compromise solution provided a well-balanced fit to individual characteristics of the streamflow hydrographs and exhibited improved volume balance. Other performance criteria such as bias, the Pearson correlation coefficient, and the relative variability remained largely unchanged between compromise solutions and Pareto extremes. Parameter sets of the best NSE fit and the compromise solution of the largest basin (gauge at Plochingen) were used to simulate streamflow at the other 21 internal subbasins for a 10-year evaluation period without re-calibration. Both parameter sets performed well in the individual basins with median NSE values of 0.74 and 0.72, respectively. The compromise solution resulted in similar NSEₗₒgQ-ranges and a 14.6 % lower median volume-balance error which indicates an overall better model performance. The results demonstrate that the performance criteria for hydrological model calibration should be selected in accordance with the anticipated model predictions. The compromise solution provides an advance to the use of single criteria in model calibration.
•Wairau aquifer storage is highly dynamic and driven by river recharge.•Natural and anthropogenic factors contribute to decreasing availability of storage.•Climate variability impedes the detection ...of trends and hydrological regime changes.•Aquifer recharge decreases as low-flow periods become more frequent.•River erosion causes declining recharge and a possibly permanent loss of storage.
The unconfined Wairau Aquifer in the Marlborough District of New Zealand is almost exclusively recharged by the Wairau River and serves as the major resource for drinking water and irrigation in the region. A declining trend in aquifer levels and low-land spring flows has been observed for the past decades.
The aim of this study is to identify and analyse natural and anthropogenic factors controlling the hydrological regime of the Wairau Aquifer. Concurrent trends in the long-term water balance components for the Wairau catchment and in low-flow statistics as well as the correlation between hydro-meteorological drivers and the Interdecadal Pacific Oscillation (IPO) index were investigated. The impact of river morphology changes on river recharge rates was studied using a previously developed groundwater flow model.
Our study found that long-term trends in declining catchment-scale precipitation are superimposed on climate oscillation and a strong annual variability. Jointly, these processes have resulted in lower than average river flows, increased low-flow periods, and consequently in lower rates of aquifer recharge. River engineering caused erosion of the braided river morphology, leading to a possibly permanent loss of aquifer storage. Groundwater abstraction is not accurately known which is a limitation of this study. This additional information and adaptation strategies are required for sustainable management of the groundwater resources.
Lateral subsurface flow (LSF) is a phenomenon frequently occurring in the field induced by local water saturation along horizon boundaries under nonequilibrium conditions. However, observations of ...LSF in undisturbed soils under controlled irrigation in the laboratory are limited but needed for model improvement, prediction, and quantification of LSF. We present a method for extracting an undisturbed soil monolith along a soil horizon boundary and introduce an experimental setup for the measurement of LSF and an irrigation device for simulating rainfall. An experimental test run was simulated using HYDRUS 2D. Water infiltrating into the monolith and flowing either laterally along the horizon boundary or vertically through the bottom horizon could be separately captured by suction discs at the side and the bottom. Thus, a clear distinction between lateral and vertical flow was possible. Pressure heads and water contents were recorded by tensiometers and frequency domain reflectometry (FDR) sensors distributed across the monolith in a regular two‐dimensional, vertical, cross‐sectional pattern. Sensor readings indicated the presence of nonequilibrium conditions within the monolith. Modeling results could reproduce the lateral and vertical outflow of the monolith under constant irrigation, thus showing that water flow within the monolith under steady‐state conditions can be explained by the Richards equation and the van Genuchten–Mualem model. The presented method can be used to improve and verify models designed for the prediction of the onset of LSF including that induced by local nonequilibrium conditions.
Core Ideas
A Laboratory method to induce and quantify lateral subsurface flow (LSF) is presented.
The experimental setup is verified by modeling with HYDRUS 2D.
Sampling of rectangular soil monoliths for 2D flow experiments is improved.
Lateral subsurface flow and hydraulic nonequilibrium conditions are observed.
The experimental data allow for improving models on the onset of LSF.
Inverse modeling has become increasingly popular for estimating effective hydraulic properties across a range of spatial scales. In recent years, many different algorithms have been developed to ...solve complex multiobjective optimization problems. In this study, we compared the efficiency of the Nondominated Sorting Genetic Algorithm (NSGA-II), the Multiobjective Shuffled Complex Evolution Metropolis algorithm (MOSCEM-UA), and AMALGAM, a multialgorithm genetically adaptive search method for multiobjective estimation of soil hydraulic parameters. In our analyses, we implemented the HYDRUS-1D model and used observed pressure head data at three different depths from the Spydia experimental field site in New Zealand. Our optimization problem was posed in a multiobjective context by simultaneously using three complementary RMSE criteria at each depth. We analyzed the trade-off between these criteria and the adherent Pareto uncertainty. The results demonstrate that all three algorithms were able to find a good approximation of the Pareto set of solutions, but differed in the rate of convergence to this distribution. Small differences in performance of the various algorithms were observed because of the relative high dimension of the optimization problem in combination with the presence of multiple local optimal solutions within the three-objective search space. The Pareto parameter sets yielded satisfactory results when simulating the transient tensiometric pressure at predetermined observation points in the investigated vadose zone profile. The overall best parameter set was found by AMALGAM with RMSE values of 0.14, 0.11, and 0.17 m at the 0.4-, 1.0-, and 2.6-m depths, respectively. In contrast, the fit errors were substantially higher at these respective depths, with RMSE values ranging from 0.87 to 1.49 m, when using soil hydraulic parameters derived from laboratory analysis of small vadose zone cores.
Hydraulic nonequilibrium in soil during water infiltration and drainage is a well‐known phenomenon. During infiltration, water initially invades easily accessible pores before it slowly redistributes ...towards some state of energetic minimum. In analogy, during drainage, easily drainable pores are emptied more rapidly than those blocked by bottlenecks. The consequence is that the water content is lagging behind the water potential and both state variables do not follow a unique water retention curve as typically assumed when applying Richards equation. Current models that account for nonequilibrium allow for the required decoupling of water content and water potential; however, they do not consider the consequences for the hydraulic conductivity. In this contribution, we present a physically based approach to estimate hydraulic conductivity during nonequilibrium, which depends on both water content and water potential during nonequilibrium conditions. This approach of a dynamic hydraulic conductivity function is demonstrated for an infiltration process into relatively dry soil and for a stepwise drainage and rewetting with decreasing and increasing water fluxes (i.e., multistep flux experiment). The new approach reproduces well‐known phenomena such as pressure overshoot and preferential flow across infiltration fronts using a unified concept for hydraulic conductivity. This was not possible with existing models assuming some fixed unsaturated conductivity function depending on either water content or water potential.
Core Ideas
Soils are rarely in hydraulic equilibrium.
We show consequences for their effective hydraulic conductivity.
We present a physically based concept how to better describe the unsaturated conductivity function.
The new approach describes pressure overshoot across fronts and the emergence of preferential during infiltration.
Display omitted
•Different phases of DOC mobilization occur during runoff events.•Hydrological response of hillslope/riparian zone varies between runoff events.•Low and delayed DOC mobilization ...occurs in dry conditions.•Response time of riparian zone is an indicator for hot moments of DOC export.
Rising trends in the concentrations of dissolved organic carbon (DOC) are observed in many inland waters, including the headwater catchment of the Große Ohe river in the Bavarian Forest National Park (Germany). During flood events, DOC is mobilized via different hydrological pathways, affecting the hydrochemistry of aquatic ecosystems and the viability of drinking water supply. In our field experiments we observed different phases of DOC mobilization during six intensively studied rainfall-runoff events with contrasting antecedent wetness conditions. We propose response time diagrams to link the different phases of DOC mobilization to different response times along a hillslope-riparian-zone-transect. Depending on the antecedent wetness conditions, the hillslope and riparian zone participated differently to phases of the runoff event and shaped the flow hydrograph and DOC export at the catchment outlet. The hillslope always responded with little time delay to precipitation events regardless of the antecedent wetness condition. In contrast, response times in the riparian zone varied. For wet antecedent conditions we observed little delay between the hillslope- and riparian zone peak response, which caused high peak discharge, fast DOC mobilization and high DOC export from the catchment. In contrast, for dry antecedent conditions, the riparian zone response was significantly lower and much delayed. This led not only to attenuated peak discharge but also to larger time lags between the flow hydrograph- and the DOC concentration peak. The combination of low runoff rates and delayed DOC concentration peaks resulted into lower DOC export from the catchment. Thus, the antecedent wetness condition and response times of the hillslope and riparian zone are important indicators for DOC export in the catchment.
Sustainable water quality management requires a profound understanding of water fluxes (precipitation, run-off, recharge, etc.) and solute turnover such as retention, reaction, transformation, etc. ...at the catchment or landscape scale. The Water and Earth System Science competence cluster (WESS, http://www.wess.info/ ) aims at a holistic analysis of the water cycle coupled to reactive solute transport, including soil–plant–atmosphere and groundwater–surface water interactions. To facilitate exploring the impact of land-use and climate changes on water cycling and water quality, special emphasis is placed on feedbacks between the atmosphere, the land surface, and the subsurface. A major challenge lies in bridging the scales in monitoring and modeling of surface/subsurface versus atmospheric processes. The field work follows the approach of contrasting catchments, i.e. neighboring watersheds with different land use or similar watersheds with different climate. This paper introduces the featured catchments and explains methodologies of WESS by selected examples.
This study elucidates the behavior of Markov-Chains Monte Carlo ensemble samplers for vadose zone inverse modeling by performing an in-depth comparison of four algorithms that use Affine-Invariant ...(AI) moves or Differential Evolution (DE) strategies to approximate the target density. Two Rosenbrock toy distributions, and one synthetic and one actual case study focusing on the inverse estimation of soil hydraulic parameters using HYDRUS-1D, are used to compare samplers in different dimensions d. The analysis reveals that an ensemble with N=d+1 chains evolved using DE-based strategies converges to the wrong stationary posterior, while AI does not suffer from this issue but exhibits delayed convergence. DE-based samplers regain their ergodic properties when using N≥2d chains. Increasing the number of chains above this threshold has only minor effects on the samplers’ performance, while initializing the ensemble in a high-likelihood region facilitates its convergence. AI strategies exhibit shorter autocorrelation times in the 7d synthetic vadose zone scenario, while DE-based samplers outperform them when the number of soil parameters increases to 16 in the actual scenario. All evaluation metrics degrade as d increases, thus suggesting that sampling strategies based only on interpolation between chains tend to become inefficient when the bulk of the posterior lays in increasingly small portions of the parameters’ space.
•Affine-Invariant (AI) and Differential Evolution (DE) strategies are compared.•DE samplers converge to the wrong posterior if the number of chains N is low.•It is advisable to increase N and start the ensemble in a high-likelihood region.•AI outperforms DE for low dimensional vadose zone problems.•DE shows better performance when the number of parameters increases.