Although prior studies have evaluated the role of sampling errors associated with local and regional methods to estimate peak flow quantiles, the investigation of epistemic errors is more difficult ...because the underlying properties of the random variable have been prescribed using ad‐hoc characterizations of the regional distributions of peak flows. This study addresses this challenge using representations of regional peak flow distributions derived from a combined framework of stochastic storm transposition, radar rainfall observations, and distributed hydrologic modeling. The authors evaluated four commonly used peak flow quantile estimation methods using synthetic peak flows at 5,000 sites in the Turkey River watershed in Iowa, USA. They first used at‐site flood frequency analysis using the Pearson Type III distribution with L‐moments. The authors then pooled regional information using (1) the index flood method, (2) the quantile regression technique, and (3) the parameter regression. This approach allowed quantification of error components stemming from epistemic assumptions, parameter estimation method, sample size, and, in the regional approaches, the number of pooled sites. The results demonstrate that the inability to capture the spatial variability of the skewness of the peak flows dominates epistemic error for regional methods. We concluded that, in the study basin, this variability could be partially explained by river network structure and the predominant orientation of the watershed. The general approach used in this study is promising in that it brings new tools and sources of data to the study of the old hydrologic problem of flood frequency analysis.
Key Points
Synthetic peak flows are used to investigate the epistemic and sampling errors associated with local and regional methods to estimate PFQs
Error components stemming from epistemic assumptions, parameter estimation method, sample size, and the number of pooled sites are evaluated
The spatial variability of the skewness of the peak flows is partially explained by river network structure and orientation of the watershed
This study investigates whether long‐term changes in observed series of high flows can be attributed to changes in land use via nonstationary flood‐frequency analyses. A point process ...characterization of threshold exceedances is used, which allows for direct inclusion of covariates in the model; as well as a nonstationary model for block maxima series. In particular, changes in annual, winter, and summer block maxima and peaks over threshold extracted from gauged instantaneous flows records in two hydrologically similar catchments located in proximity to one another in northern England are investigated. The study catchment is characterized by large increases in urbanization levels in recent decades, while the paired control catchment has remained undeveloped during the study period (1970–2010). To avoid the potential confounding effect of natural variability, a covariate which summarizes key climatological properties is included in the flood‐frequency model. A significant effect of the increasing urbanization levels on high flows is detected, in particular in the summer season. Point process models appear to be superior to block maxima models in their ability to detect the effect of the increase in urbanization levels on high flows.
Key Points:
Urbanization is found to have an impact on high flows in a urbanized catchment
The use of point processes is advocated for trend detection and attribution
The use of process‐related covariates gives a better representation of change
Flood Frequency Analysis (FFA) is a process to relate the magnitude of extreme streamflows to their frequency of occurrence through distribution functions, and the results of FFA are useful in design ...flood estimation for various hydraulic structures. The FFA relies primarily on actual observed streamflow data (Qobts) and certain underlying assumptions that form the basis of FFA's philosophy. In today's world, Qobts time series are available for most river basins; however, it is an indubitable reality that Qobts do not satisfy the necessary requirements. Therefore, an alternative doctrine is needed to create such a dataset. This paper provides an overview of the limitations of utilising Qobts along with its drivers and suggests a surrogate mechanism to perform FFA in today's world.
Display omitted
•A framework for estimating design flood discharge in ungauged watersheds at a national scale was proposed.•The influence of a comprehensive set of environmental, hydro-climatic and ...geomorphological variables on peak discharge was investigated.•The PSO algorithm was employed to adjust the hyper-parameters of the XGBoost model.•SHAP values were utilized to identify the primary contributing features to model outputs and investigate the spatial heterogeneity of the flood drivers.•The results indicate that the features’ participation in determining peak discharge magnitude varies spatially across Iran’s basins.
Identifying flood drivers and accurately estimating design floods play a crucial role in fostering sustainable and effective planning and management strategies for mitigating flood risks. Regional Flood frequency Analysis (RFFA) is one of the most commonly used approaches to estimate design floods in ungauged watersheds. This study used XGBoost coupled with Particle Swarm Optimization (PSO) to estimate different quantiles of the design floods with return periods (from 2-year to 1000-year). After a preliminary assessment, 373 nationwide hydrometric stations were selected to conduct at-site flood frequency analysis by identifying the best-fitting distribution. Using the capabilities of GIS and Google Earth Engine (GEE), 83 independent features including different physiographical, geomorphological, land-use, soil types, and long-term hydro-climatic and environmental variables were extracted for the upstream watersheds. After fine-tuning the hyper-parameters of the XGBoost method for each flood quantile, the feature importance values were used to eliminate the insignificant features and refine the developed models. Additionally, classical methods such as Support Vector Regression (SVR) and Random Forest (RF) were implemented, to evaluate the XGBoost models efficiency. Different statistics demonstrated that the models effectively estimated flood quantiles, with the Nash-Sutcliffe Efficiency (NSE) varying from 0.709 to 0.840 across all models. A comparison of model performance reveals that the XGBoost method outperformed RF and SVR across all flood quantiles. Based on the developed models, design floods have been estimated for 949 stations across Iran. Furthermore, the Shapley additive explanation (SHAP) values were used to identify the main contributing features to model outputs and investigate the spatial heterogeneity of main flood drivers. According to the results, the perimeter and length of the watershed and heavy rainfall exhibit notably high importance compared to other features for all models. Based on the local SHAP values, in Northern, Northwestern, and Western basins, features associated with watershed sizes, such as perimeter, area and length exhibit the highest levels of importance. Moreover, the Southwest basins are more influenced by “heavy rainfall”. These findings demonstrate the promise of the developed models for estimating flood quantiles across diverse environmental, geomorphological, and hydro-climatic conditions. This capability is valuable for sustainable watershed management, especially in environments with limited maximum discharge data.
•Novel flood probability estimation framework has been proposed.•Clustering, data pooling, and mixed distribution to address the sample heterogeneity.•Accuracy of estimation of extreme flood ...probability has been improved.
The accurate estimation of flood probability is crucial for designing water storage and flood retention structures. However, the assumption of identical distribution in flood samples is unrealistic, given the influence of various flood mechanisms. To address this challenge, we proposed a novel framework based on flood clustering and data pooling that encompasses the key steps such as 1) flood event separation based on a peak-detection flood separation algorithm, 2) grouping flood events using the k-prototypes algorithm, 3) application of the UNprecedented Simulated Extreme ENsemble (UNSEEN) approach to pool reforecast ensemble datasets, and 4) statistical mixing approach to derive common quantiles from all the flood groups. We applied the framework to the Dresden gauge in the Elbe River for a detailed case study. Various tests have been performed to assess the applicability of the UNSEEN approach and the reforecast dataset consistently shows the potential for data pooling. The proposed methodology outperformed the classical approach in terms of goodness-of-fit. The relative difference between the classical and the proposed approach ((classical-proposed)/proposed) for the 100-year return level is 0.16, with a reduction in root mean square error (RMSE) value from 163 to 98 m3/s. Further, replication of the approach to the gauges in North Germany exhibited a relative difference ranging from −0.3 to +0.15 and produced better estimates in terms of RMSE compared with the traditional model. In summary, the proposed framework offers a better estimation of flood probability by addressing the inherent sample inhomogeneity along with the inclusion of unprecedented flood samples.
•Random Forest Regression (RFR) is used for regional flood frequency analysis (RFA).•RFR is also combined with Canonical Correlation Analysis (CCA): CCA-RFR.•The two techniques are compared to other ...linear and non-linear RFA models.•CCA-RFR leads to the best performance in terms of root mean squared error.•RFR is simple to apply and more efficient than more complex models.
Flood quantile estimation at sites with little or no data is important for the adequate planning and management of water resources. Regional Hydrological Frequency Analysis (RFA) deals with the estimation of hydrological variables at ungauged sites. Random Forest (RF) is an ensemble learning technique which uses multiple Classification and Regression Trees (CART) for classification, regression, and other tasks. The RF technique is gaining popularity in a number of fields because of its powerful non-linear and non-parametric nature. In the present study, we investigate the use of Random Forest Regression (RFR) in the estimation step of RFA based on a case study represented by data collected from 151 hydrometric stations from the province of Quebec, Canada. RFR is applied to the whole data set and to homogeneous regions of stations delineated by canonical correlation analysis (CCA). Using the Out-of-bag error rate feature of RF, the optimal number of trees for the dataset is calculated. The results of the application of the CCA based RFR model (CCA-RFR) are compared to results obtained with a number of other linear and non-linear RFA models. CCA-RFR leads to the best performance in terms of root mean squared error. The use of CCA to delineate neighborhoods improves considerably the performance of RFR. RFR is found to be simple to apply and more efficient than more complex models such as Artificial Neural Network-based models.
Regional flood frequency analysis is still an important area of hydrology research as there are many ungauged catchments. The majority of hydrological methods in regional flood frequency analysis ...involve complex non‐linear relationships between predictor variables and flood characteristics. In the past, dimensionality reduction techniques based on linear methods such as canonical correlation analysis (CCA) were used in regional flood frequency analysis to delineate hydrological clusters. Non‐linear dimensionality reduction techniques, such as KCCA and multidimensional scaling (MDS), have been used in several fields of science, but not explicitly in regional flood frequency analysis. To determine hydrologically similar clusters, the approaches considered in this article use CCA, KCCA, and MDS as dimensionality reduction techniques in conjunction with Gaussian mixture models (GMM). Log‐linear regression and generalized additive models are then applied to the hydrological clusters to evaluate regional flood frequency analysis. A comparison of linear and non‐linear (NL) methods is performed using data from Victoria, Australia, to demonstrate the benefit of these methods. It has been found that the non‐linear frameworks of multi‐dimensional scaling with Gaussian mixture model‐non‐linear (MDSGMM‐NL) and KCCA with Gaussian mixture model‐non‐linear (KCCAGMM‐NL), as well as the mixed frameworks (i.e. KCCA and Gaussian mixture model‐non‐linear CCAGMM‐NL), can be used to represent the non‐linear complexities of hydrological processes in regional flood frequency analysis.
Orientation of hydrological clusters for dimensionality reduction technique for MDSGMM.
•Nonstationary frequency analyses should not be based only on at-site time series.•Nonstationary models introduce additional sources of uncertainty.•Misspecification of nonstationary models can lead ...to physically inconsistent results.•Nonstationary models can provide no practical enhancement of results’ credibility.•Risk of failure is a more realistic measure of the risk for design purposes.
The increasing effort to develop and apply nonstationary models in hydrologic frequency analyses under changing environmental conditions can be frustrated when the additional uncertainty related to the model complexity is accounted for along with the sampling uncertainty. In order to show the practical implications and possible problems of using nonstationary models and provide critical guidelines, in this study we review the main tools developed in this field (such as nonstationary distribution functions, return periods, and risk of failure) highlighting advantages and disadvantages. The discussion is supported by three case studies that revise three illustrative examples reported in the scientific and technical literature referring to the Little Sugar Creek (at Charlotte, North Carolina), Red River of the North (North Dakota/Minnesota), and the Assunpink Creek (at Trenton, New Jersey). The uncertainty of the results is assessed by complementing point estimates with confidence intervals (CIs) and emphasizing critical aspects such as the subjectivity affecting the choice of the models’ structure. Our results show that (1) nonstationary frequency analyses should not only be based on at-site time series but require additional information and detailed exploratory data analyses (EDA); (2) as nonstationary models imply that the time-varying model structure holds true for the entire future design life period, an appropriate modeling strategy requires that EDA identifies a well-defined deterministic mechanism leading the examined process; (3) when the model structure cannot be inferred in a deductive manner and nonstationary models are fitted by inductive inference, model structure introduces an additional source of uncertainty so that the resulting nonstationary models can provide no practical enhancement of the credibility and accuracy of the predicted extreme quantiles, whereas possible model misspecification can easily lead to physically inconsistent results; (4) when the model structure is uncertain, stationary models and a suitable assessment of the uncertainty accounting for possible temporal persistence should be retained as more theoretically coherent and reliable options for practical applications in real-world design and management problems; (5) a clear understanding of the actual probabilistic meaning of stationary and nonstationary return periods and risk of failure is required for a correct risk assessment and communication.
In frequency analysis of annual maximum flood series (AMFS), it is mechanism‐based and significant to incorporate the information of the underlying “ordinary” daily streamflow events for improving ...accuracy of flood risk estimation and appropriate management. In previous studies related to flood nonstationarity, the classical norming constants method (C‐NCM) has been used to derive statistical parameters of annual maximum flood distribution from daily streamflow distribution. However, C‐NCM does not flexibly consider the feasible range of the scale parameter of annual maximum flood distributions, which can result in the inability to provide sufficient and reliable models to fit AMFS. This paper aims to investigate the potential of two alternative norming constants methods in fitting annual maximum floods, namely the Hall's norming constants method (H‐NCM) and the Fisher and Tippetts' norming constants method (FT‐NCM) respectively. A comparative study of these three methods was carried out using hydrological streamflow series of 77 stations in the Yangtze and Yellow River basins, China. The results found that H‐NCM outperforms both C‐NCM and FT‐NCM for the stations with relatively low‐skewness coefficient of AMFS. Therefore, H‐NCM is recommended to prioritize for practical applications when considering daily streamflow rather than small samples of flood events. Furthermore, if the skewness coefficient exceeds Cs,Gumbel≈1.14, the overall modelling performance of these three methods is significantly deteriorated. This can provide some reference for the applicability of the norming constants method in flood frequency analysis.
The norming constants method is to deduce the distribution of flood extreme value series from the distribution of annual daily flow series. It breaks through the limitation of the loss of effective flood series information in traditional flood frequency analysis and expands the flood series sample information.
•Investigate the projected climate change on flood frequency and quantiles within a bivariate framework.•Develop the non-stationary most likely flood quantile estimation method.•Derive the adaptive ...flood quantiles by a time-varying moment approach.•Climate change exacerbates extreme floods in Ganjiang River basin.•Flood frequency curve varies significantly over time considering non-stationarity.
Climate change will lead to great impacts on flood frequency curve and design floods in the future. However, traditional hydrologic approaches often fail to analyze the flood characteristics within a bivariate framework under changing environment. Moreover, previous studies investigating bivariate characteristics of flood usually do not derive the adaptive flood quantiles. This study assesses the implications of climate change for future bivariate quantiles of flood peak and volume in Ganjiang River basin, China. The outputs of two global climate models (BNU-ESM and BCC-CSM1.1) are statistically downscaled by Daily bias correction (DBC) method and used as inputs of the Xinanjiang hydrological model to simulate streamflow during 1966–2099. Projections for future flood (2020–2099) under Representative Concentration Pathway (RCP) 8.5 scenario are divided into two 40-year horizons (2040s, 2080s) and a comparison is made between these time horizons and the baseline (1966–2005). Univariate flood frequency analysis indicates that there is a considerable increase in the magnitude and frequency of flood under the RCP8.5 scenario, especially for the higher return periods. The bivariate quantile curves under different levels of Joint Return Period (JRP) for historical and future periods are derived by copula functions and the most likely realizations are estimated. It is found that climate change has heavier impacts on the future joint bivariate quantiles for larger return periods. Finally the adaptive isolines and most likely flood quantiles under a JRP are derived from analyzing the merged series by non-stationary copula-based models. The results highlight that the joint probability, illustrated by the isoline of a given JRP, varies significantly over time when non-stationary models are applied. This study incorporates the impacts of climate change on bivariate flood quantiles and develops an adaptive quantile estimation approach, which may provide useful information for the references of flood risk assessment and management under changing environment.