Dust storms have many negative consequences, and affect all kinds of ecosystems, as well as climate and weather conditions. Therefore, classification of dust storm sources into different ...susceptibility categories can help us mitigate its negative effects. This study aimed to classify the susceptibility of dust sources in the Middle East (ME) by developing two novel deep learning (DL) hybrid models based on the convolutional neural network-gated recurrent unit (CNN-GRU) model, and the dense layer deep learning-random forest (DLDL-RF) model. The Dragonfly algorithm (DA) was used to identify the critical features controlling dust sources. Game theory was used for the interpretability of the DL model's output. Predictive DL models were constructed by dividing datasets randomly into train (70%) and test (30%) groups, six statistical indicators being then applied to assess the DL hybrid model performance for both datasets (train and test). Among 13 potential features (or variables) controlling dust sources, seven variables were selected as important and six as non-important by DA, respectively. Based on the DLDL-RF hybrid model - a model with higher accuracy in comparison with CNN-GRU-23.1, 22.8, and 22.2% of the study area were classified as being of very low, low and moderate susceptibility, whereas 20.2 and 11.7% of the area were classified as representing high and very high susceptibility classes, respectively. Among seven important features selected by DA, clay content, silt content, and precipitation were identified as the three most important by game theory through permutation values. Overall, DL hybrid models were found to be efficient methods for prediction purposes on large spatial scales with no or incomplete datasets from ground-based measurements.
Land susceptibility to wind erosion hazard in Isfahan province, Iran, was mapped by testing 16 advanced regression-based machine learning methods: Robust linear regression (RLR), Cforest, Non-convex ...penalized quantile regression (NCPQR), Neural network with feature extraction (NNFE), Monotone multi-layer perception neural network (MMLPNN), Ridge regression (RR), Boosting generalized linear model (BGLM), Negative binomial generalized linear model (NBGLM), Boosting generalized additive model (BGAM), Spline generalized additive model (SGAM), Spike and slab regression (SSR), Stochastic gradient boosting (SGB), support vector machine (SVM), Relevance vector machine (RVM) and the Cubist and Adaptive network-based fuzzy inference system (ANFIS). Thirteen factors controlling wind erosion were mapped, and multicollinearity among these factors was quantified using the tolerance coefficient (TC) and variance inflation factor (VIF). Model performance was assessed by RMSE, MAE, MBE, and a Taylor diagram using both training and validation datasets. The result showed that five models (MMLPNN, SGAM, Cforest, BGAM and SGB) are capable of delivering a high prediction accuracy for land susceptibility to wind erosion hazard. DEM, precipitation, and vegetation (NDVI) are the most critical factors controlling wind erosion in the study area. Overall, regression-based machine learning models are efficient techniques for mapping land susceptibility to wind erosion hazards.
Abstract
This research introduces a new combined modelling approach for mapping soil salinity in the Minab plain in southern Iran. This study assessed the uncertainty (with 95% confidence limits) and ...interpretability of two deep learning (DL) models (deep boltzmann machine—DBM) and a one dimensional convolutional neural networks (1DCNN)—long short-term memory (LSTM) hybrid model (1DCNN-LSTM) for mapping soil salinity by applying DeepQuantreg and game theory (Shapely Additive exPlanations (SHAP) and permutation feature importance measure (PFIM)), respectively. Based on stepwise forward regression (SFR)—a technique for controlling factor selection, 18 of 47 potential controls were selected as effective factors. Inventory maps of soil salinity were generated based on 476 surface soil samples collected for measuring electrical conductivity (ECe). Based on Taylor diagrams, both DL models performed well (RMSE < 20%), but the 1DCNN-LSTM hybrid model performed slightly better than the DBM model. The uncertainty range associated with the ECe values predicted by both models estimated using DeepQuantilreg were similar (0–25 dS/m for the 1DCNN-LSTM hybrid model and 2–27 dS/m for DBM model). Based on the SFR and PFIM (permutation feature importance measure)—a measure in game theory, four controls (evaporation, sand content, precipitation and vertical distance to channel) were selected as the most important factors for soil salinity in the study area. The results of SHAP (Shapely Additive exPlanations)—the second measure used in game theory—suggested that five factors (evaporation, vertical distance to channel, sand content, cation exchange capacity (CEC) and digital elevation model (DEM)) have the strongest impact on model outputs. Overall, the methodology used in this study is recommend for applications in other regions for mapping environmental problems.
This research developed a more efficient integrated model (IM) based on combining the Nash-Sutcliffe efficiency coefficient (NSEC) and individual data mining (DM) algorithms for the spatial mapping ...of dust provenance in the Hamoun-e-Hirmand Basin, southeastern Iran. This region experiences severe wind erosion and includes the Sistan plain which is one of the most PM
2.5
-polluted regions in the world. Due to a prolonged drought over the last two decades, the frequency of dust storms in the study area is increasing remarkably. Herein, 14 factors controlling dust emissions (FCDEs) including soil characteristics, climatic variables, digital elevation map, normalized difference vegetation index, land use and geology were mapped. Correlation and collinearity among the FCDEs were examined by the Pearson test, tolerance coefficient (TC) and variance inflation factor (VIF), with the results suggesting a lack of collinearity between FCDEs. A tree-based genetic algorithm was applied to prioritize and quantify the importance weights of the FCDEs. Thirteen individual data mining models were applied for mapping dust provenance. The model performance was assessed using root mean square error, mean absolute error and NSEC. Based on clustering analysis, the 13 DM models were grouped into five clusters and then the cluster with the highest NSEC values used in an integrated modelling process. Based on the results, the IM (NSEC = 93%) outperformed the individual DM models (the NSEC values range between 51 and 92%). Using the IM, 11, 5, 7 and 77% of the total study area were classified into low, moderate, high and very high susceptibility classes for dust provenance, respectively. Overall, the results illustrate the benefits of an IM for mapping spatial variation in the susceptibility of catchment areas to act as dust sources.
Spatial accurate mapping of land susceptibility to wind erosion is necessary to mitigate its destructive consequences. In this research, for the first time, we developed a novel methodology based on ...deep learning (DL) and active learning (AL) models, their combination (e.g., recurrent neural network (RNN), RNN-AL, gated recurrent units (GRU), and GRU-AL) and three interpretation techniques (e.g., synergy matrix, SHapley Additive exPlanations (SHAP) decision plot, and accumulated local effects (ALE) plot) to map global land susceptibility to wind erosion. In this respect, 13 variables were explored as controlling factors to wind erosion, and eight of them (e.g., wind speed, topsoil carbon content, topsoil clay content, elevation, topsoil gravel fragment, precipitation, topsoil sand content and soil moisture) were selected as important factors via the Harris Hawk Optimization (HHO) feature selection algorithm. The four models were applied to map land susceptibility to wind erosion, and their performance was assessed by three measures consisting of area under of receiver operating characteristic (AUROC) curve, cumulative gain and Kolmogorov Smirnov (KS) statistic plots. The results revealed that GRU-AL model was considered as the most accurate, revealing that 38.5%, 12.6%, 10.3%, 12.5% and 26.1% of the global lands are grouped at very low, low, moderate, high and very high susceptibility classes to wind erosion hazard, respectively. Interpretation techniques were applied to interpret the contribution and impact of the eight input variables on the model’s output. Synergy plot revealed that the soil carbon content exhibited high synergy with DEM and soil moisture on the model’s predictions. ALE plot showed that soil carbon content and precipitation had negative feedback on the prediction of land susceptibility to wind erosion. Based on SHAP decision plot, soil moisture and DEM presented the highest contribution on the model’s output. Results highlighted new regions at high latitudes (southern Greenland coast, hotspots in Alaska and Siberia), which exhibited high and very high land susceptibility to wind erosion.
Predicting land susceptibility to wind erosion is necessary to mitigate the negative impacts of erosion on soil fertility, ecosystems, and human health. This study is the first attempt to model wind ...erosion hazards through the application of a novel approach, the graph convolutional networks (GCNs), as deep learning models with Monte Carlo dropout. This approach is applied to Semnan Province in arid central Iran, an area vulnerable to dust storms and climate change. We mapped 15 potential factors controlling wind erosion, including climatic variables, soil characteristics, lithology, vegetation cover, land use, and a digital elevation model (DEM), and then applied the least absolute shrinkage and selection operator (LASSO) regression to discriminate the most important factors. We constructed a predictive model by randomly selecting 70% and 30% of the pixels, as training and validation datasets, respectively, focusing on locations with severe wind erosion on the inventory map. The current LASSO regression identified eight out of the 15 features (four soil property categories, vegetation cover, land use, wind speed, and evaporation) as the most important factors controlling wind erosion in Semnan Province. These factors were adopted into the GCN model, which estimated that 15.5%, 19.8%, 33.2%, and 31.4% of the total area is characterized by low, moderate, high, and very high susceptibility to wind erosion, respectively. The area under curve (AUC) and SHapley Additive exPlanations (SHAP) of game theory were applied to assess the performance and interpretability of GCN output, respectively. The AUC values for training and validation datasets were estimated at 97.2% and 97.25%, respectively, indicating excellent model prediction. SHAP values ranged between −0.3 and 0.4, while SHAP analyses revealed that the coarse clastic component, vegetation cover, and land use were the most effective features of the GCN output. Our results suggest that this novel suite of methods is highly recommended for future spatial prediction of wind erosion hazards in other arid environments around the globe.
Flood risk assessment is a key step in flood management and mitigation, and flood risk maps provide a quantitative measure of flood risk. Therefore, integration of deep learning – an updated version ...of machine learning techniques – and multi-criteria decision making (MCDM) models can generate high-resolution flood risk maps. In this study, a novel integrated approach has been developed based on multiplicative long short-term memory (mLSTM) deep learning models and an MCDM ensemble model to map flood risk in the Minab-Shamil plain, southern Iran. A flood hazard map generated by the mLSTM model is based on nine critical features selected by GrootCV (distance to the river, vegetation cover, variables extracted from DEM (digital elevation model) and river density) and a flood inventory map (70% and 30% data were randomly selected as training and test datasets, respectively). The values of all criteria used to assess model accuracy performance (except Cohens kappa for train dataset = 86, and for test dataset = 84) achieved values greater than 90, which indicates that the mLSTM model performed very well for the generation of a spatial flood hazard map. According to the spatial flood hazard map produced by mLSTM, the very low, low, moderate, high and very high classes cover 26%, 35.3%, 20.5%, 11.2% and 7% of the total area, respectively. Flood vulnerability maps were produced by the combinative distance-based assessment (CODAS), the evaluation based on distance from average solution (EDAS), and the multi-objective optimization on the basis of simple ratio analysis (MOOSRA), and then validated by Spearman's rank correlation coefficients (SRC). Based on the SRC, the three models CODAS, EDAS, and MOOSRA showed high-ranking correlations with each other, and all three models were then used in the ensemble process. According to the CODAS-EDAS-MOOSRA ensemble model, 21.5%, 34.2%, 23.7%, 13%, and 7.6% of the total area were classified as having a very low to very high flood vulnerability, respectively. Finally, a flood risk map was generated by the combination of flood hazard and vulnerability maps produced by the mLSTM and MCDM ensemble model. According to the flood risk map, 27.4%, 34.3%, 14.8%, 15.7%, and 7.8% of the total area were classified as having a very low, low, moderate, high, and very high flood risk, respectively. Overall, the integration of mLSTM and the MCDM ensemble is a promising tool for generating precise flood risk maps and provides a useful reference for flood risk management.
•MLSTM and MCDM models were applied to map flood hazard and its vulnerability, respectively.•A novel integrated technique based on mLSTM and ensemble MCDM introduced for assessment of flood risk.•Integration of DL and MCDM is a promising tool for generating precise flood risk map.
Gully erosion possess a serious hazard to critical resources such as soil, water, and vegetation cover within watersheds. Therefore, spatial maps of gully erosion hazards can be instrumental in ...mitigating its negative consequences. Among the various methods used to explore and map gully erosion, advanced learning techniques, especially deep learning (DL) models, are highly capable of spatial mapping and can provide accurate predictions for generating spatial maps of gully erosion at different scales (e.g., local, regional, continental, and global). In this paper, we applied two DL models, namely a simple recurrent neural network (RNN) and a gated recurrent unit (GRU), to map land susceptibility to gully erosion in the Shamil-Minab plain, Hormozgan province, southern Iran. To address the inherent black box nature of DL models, we applied three novel interpretability methods consisting of SHaply Additive explanation (SHAP), ceteris paribus and partial dependence (CP-PD) profiles and permutation feature importance (PFI). Using the Boruta algorithm, we identified seven important features that control gully erosion: soil bulk density, clay content, elevation, land use type, vegetation cover, sand content, and silt content. These features, along with an inventory map of gully erosion (based on a 70 % training dataset and 30 % test dataset), were used to generate spatial maps of gully erosion using DL models. According to the Kolmogorov–Smirnov (KS) statistic performance assessment measure, the simple RNN model (with KS = 91.6) outperformed the GRU model (with KS = 66.6). Based on the results from the simple RNN model, 7.4 %, 14.5 %, 18.9 %, 31.2 % and 28 % of total area of the plain were classified as very-low, low, moderate, high and very-high hazard classes, respectively. According to SHAP plots, CP-PD profiles, and PFI measures, soil silt content, vegetation cover (NDVI) and land use type had the highest impact on the model's output. Overall, the DL modelling techniques and interpretation methods used in this study proved to be helpful in generating spatial maps of soil erosion hazard, especially gully erosion. Their interpretability can support watershed sustainable management.
Display omitted
•Simple RNN and GRU deep learning models used for spatial mapping of gully erosion•Three novel measures based on game theory (SHAP, CP-PD profiles and PFI) were used to interpret deep learning models•The simple RNN model (with KS = 91.6) outperformed the GRU model (with KS = 66.6)•Interpretation methods are promising tools to interpret DL model's output
This contribution presents a novel methodology based on the feature selection, ensemble deep learning (EDL) models, and active learning (AL) approach for prediction of land subsidence (LS) hazard and ...rate, and its uncertainty in an area involving two important plains — the Minab and Shamil-Nian plains — in the Hormozgan province, southern Iran. The important features controlling LS hazard were identified by ridge regression. Then, two EDL models were constructed by stacking (SEDL) and voting (VEDL) five dense deep learning (DL) models (model 1 to model 5) for mapping LS hazard. Thereafter, the predictive model performance was assessed by a precision-recall curve and Kolmogorov–Smirnov (KS) plot. A partial dependence plot (PDP), individual conditional expectation plots (ICEP), game theory, and a sensitivity analysis were used for the interpretability of the predictive DL model. According to SEDL — a model with higher accuracy — 34% (1624 km
2
), 14.7% (698 km
2
), and 19.2% (912 km
2
) of the total area were classified as being of very low, low, and moderate hazards, whereas 17.7% (845 km
2
) and 14.4% (683 km
2
) of area were classified as being of high and very high hazards, respectively. Based on all interpretability techniques, aquifer loss or groundwater drawdown is the most important feature controlling LS hazard, and it having the greatest impact on the SEDL model output. Based on a Taylor diagram and
R
2
as model performance assessment indicators, SEDL-AL (with
R
2
> 95% for training and test datasets) performed better than SEDL for quantify LS rate, the rate of LS ranging between 0 and 48.1 cm. The highest rate of LS occurred in the Minab plain — an area located downstream of the Minab Esteghlal dam. SEDL-AL was used to quantify the uncertainty associated with the LS rate. The observed values fell within predictions provided by SEDL-AL, which indicates a high accuracy of our predictive model. Overall, our newly developed modeling techniques are helpful tools for the spatial mapping of LS susceptibility and rate, and its uncertainty.
•First comprehensive application of 15 data mining models to soil erosion.•Game theory was applied to assess the interpretability of the DM models.•BGAM is the most accurate model.•DEM derived ...factors are the most important controls.•Game theory is a valuable technique for assessing the interpretability of predictive models.
This study undertook a comprehensive application of 15 data mining (DM) models, most of which have, thus far, not been commonly used in environmental sciences, to predict land susceptibility to water erosion hazard in the Kahorestan catchment, southern Iran. The DM models were BGLM, BGAM, Cforest, CITree, GAMS, LRSS, NCPQR, PLS, PLSGLM, QR, RLM, SGB, SVM, BCART and BTR. We identified 18 factors usually considered as key controls for water erosion, comprising 10 factors extracted from a digital elevation model (DEM), three indices extracted from Landsat 8 images, a sediment connectivity index (SCI) and three other intrinsic factors. Three indicators consisting of MAE, MBE, RMSE, and a Taylor diagram were applied to assess model performance and accuracy. Game theory was applied to assess the interpretability of the DM models for predicting water erosion hazard. Among the 15 predictive models, BGAM and PLS respectively returned the best and worst performance in predicting water erosion hazard in the study area. The most accurate model, BGAM predicted that 22%, 8.2%, 9.4% and 60.4% of the total area should be classified as low, moderate, high and very high susceptibility to soil erosion by water, respectively. Based on BGAM and game theory, the factors extracted from the DEM (e.g., DEM, TWI, Slope, TST, TRI, and SPI) were considered the most important ones controlling the predicted severity of soil erosion by water. We conclude that overall, game theory is a valuable technique for assessing the interpretability of predictive models because this theory through SHAP (Shapley additive explanations) and PFIM (permutation feature importance measure) addresses the important concerns regarding the interpretability of more complex DM models.