Floods are one of the most damaging natural hazards causing huge loss of property, infrastructure and lives. Prediction of occurrence of flash flood locations is very difficult due to sudden change ...in climatic condition and manmade factors. However, prior identification of flood susceptible areas can be done with the help of machine learning techniques for proper timely management of flood hazards. In this study, we tested four decision trees based machine learning models namely Logistic Model Trees (LMT), Reduced Error Pruning Trees (REPT), Naïve Bayes Trees (NBT), and Alternating Decision Trees (ADT) for flash flood susceptibility mapping at the Haraz Watershed in the northern part of Iran. For this, a spatial database was constructed with 201 present and past flood locations and eleven flood-influencing factors namely ground slope, altitude, curvature, Stream Power Index (SPI), Topographic Wetness Index (TWI), land use, rainfall, river density, distance from river, lithology, and Normalized Difference Vegetation Index (NDVI). Statistical evaluation measures, the Receiver Operating Characteristic (ROC) curve, and Freidman and Wilcoxon signed-rank tests were used to validate and compare the prediction capability of the models. Results show that the ADT model has the highest prediction capability for flash flood susceptibility assessment, followed by the NBT, the LMT, and the REPT, respectively. These techniques have proven successful in quickly determining flood susceptible areas.
Display omitted
•Machine learning models namely LMT, REPT, NBT and ADT were used for flood assessment.•Out of four models, the ADT has the highest performance for flood assessment.•Advanced Decision Trees methods are promising for flood assessment in prone areas.
Rainfall prediction is one of the most challenging task faced by researchers over the years. Many machine learning and AI based algorithms have been implemented on different datasets for better ...prediction purposes, but there is not a single solution which perfectly predicts the rainfall. Accurate prediction still remains a question to researchers. We offer a machine learning-based comparison evaluation of rainfall models for Kashmir province. Both local geographic features and the time horizon has influence on weather forecasting. Decision trees, Logistic Model Trees (LMT), and M5 model trees are examples of predictive models based on algorithms. GWLM-NARX, Gradient Boosting, and other techniques were investigated. Weather predictors measured from three major meteorological stations in the Kashmir area of the UT of J&K, India, were utilized in the models. We compared the proposed models based on their accuracy, kappa, interpretability, and other statistics, as well as the significance of the predictors utilized. On the original dataset, the DT model delivers an accuracy of 80.12 percent, followed by the LMT and Gradient boosting models, which produce accuracy of 87.23 percent and 87.51 percent, respectively. Furthermore, when continuous data was used in the M5-MT and GWLM-NARX models, the NARX model performed better, with mean squared error (MSE) and regression value (R) predictions of 3.12 percent and 0.9899 percent in training, 0.144 percent and 0.9936 percent in validation, and 0.311 percent and 0.9988 percent in testing.
Display omitted
•The MARS is a promising & flexible modeling technique.•The potential of MARS & W-MARS models are tested for groundwater level forecasting task.•The W-MARS models perform better at ...higher lead time forecasts.
In this study, two different machine learning models, Multivariate Adaptive Regression Splines (MARS) and M5 Model Trees (MT) have been applied to simulate the groundwater level (GWL) fluctuations of three shallow open wells within diverse unconfined aquifers. The Wavelet coupled MARS and MT hybrid models were developed in an attempt to further increase the GWL forecast accuracy. The Discrete Wavelet Transform (DWT) which is particularly effective in dealing with non-stationary time-series data was employed to decompose the input time series into various sub-series components. Historical data of 10years (August-1996 to July-2006) comprising monthly groundwater level, rainfall, and temperature were used to calibrate and validate the models. The models were calibrated and tested for one, three and six months ahead forecast horizons. The wavelet coupled MARS and MT models were compared with their simple counterpart using standard statistical performance evaluation measures such as Root Mean Square Error (RMSE), Normalized Nash-Sutcliffe Efficiency (NNSE) and Coefficient of Determination (R2). The wavelet coupled MARS and MT models developed using multi-scale input data performed better compared to their simple counterpart and the forecast accuracy of W-MARS models were superior to that of W-MT models. Specifically, the DWT offered a better discrimination of non-linear and non-stationary trends that were present at various scales in the time series of the input variables thus crafting the W-MARS models to provide more accurate GWL forecasts.
Machine-learning is the automated process of uncovering patterns in large datasets using computer-based statistical models, where a fitted model may then be used for prediction purposes on new data. ...Despite the growing number of machine-learning algorithms that have been developed, relatively few studies have provided a comparison of an array of different learners — typically, model comparison studies have been restricted to a comparison of only a few models. This study evaluates and compares a suite of 10 machine-learners as classification algorithms for the prediction of soil taxonomic units in the Lower Fraser Valley, British Columbia, Canada.
A variety of machine-learners (CART, CART with bagging, Random Forest, k-nearest neighbor, nearest shrunken centroid, artificial neural network, multinomial logistic regression, logistic model trees, and support vector machine) were tested in the extraction of the complex relationships between soil taxonomic units (great groups and orders) from a conventional soil survey and a suite of 20 environmental covariates representing the topography, climate, and vegetation of the study area. Methods used to extract training data from a soil survey included by-polygon, equal-class, area-weighted, and area-weighted with random over sampling (ROS) approaches. The fitted models, which consist of the soil-environmental relationships, were then used to predict soil great groups and orders for the entire study area at a 100m spatial resolution. The resulting maps were validated using 262 points from legacy soil data.
On average, the area-weighted sampling approach for developing training data from a soil survey was most effective. Using a validation of R=1 cell, the k-nearest neighbor and support vector machine with radial basis function resulted in the highest accuracy of 72% for great groups using ROS; however, models such as CART with bagging, logistic model trees, and Random Forest were preferred due to the speed of parameterization and the interpretability of the results while resulting in similar accuracies ranging from 65–70% using the area-weighted sampling approach. Model choice and sample design greatly influenced outputs. This study provides a comprehensive comparison of machine-learning techniques for classification purposes in soil science and may assist in model selection for digital soil mapping and geomorphic modeling studies in the future.
•Soil taxonomic units were mapped for the Lower Fraser Valley.•10 machine-learning algorithms were compared.•Four methods of developing training data were compared.•Sampling from soil surveys using an area-weighted approach was most effective.•Choice of model and sampling design greatly influences outputs.
Deep reinforcement learning has shown useful in the field of robotics but the black-box nature of deep neural networks impedes the applicability of deep reinforcement learning agents for real-world ...tasks. This is addressed in the field of explainable artificial intelligence, by developing explanation methods that aim to explain such agents to humans. Model trees as surrogate models have proven useful for producing explanations for black-box models used in real-world robotic applications, in particular, due to their capability of providing explanations in real time. In this paper, we provide an overview and analysis of available methods for building model trees for explaining deep reinforcement learning agents solving robotics tasks. We find that multiple outputs are important for the model to be able to grasp the dependencies of coupled output features, i.e.actions. Additionally, our results indicate that introducing domain knowledge via a hierarchy among the input features during the building process results in higher accuracies and a faster building process.
•Annual sediment yields using SEMEP are typically 50–300 ton/km2·yr.•RMSE of model with SEMEP is 53 ton/km2·yr less than that of existing models.•Sediment transport into alluvial rivers is affected ...by river wetlands and water.•Watershed morphometric characteristics can represent sediment transport.•Geospatial analysis shows urbanization’s erosion features, yielding rich sediment.
South Korea experiences numerous local sedimentation problems, such as landslides, upland erosion, aggradation and degradation, and flood plain sediment deposition. This has necessitated the development of a reliable and consistent approach for modeling sediment processes in the country. In this study, samples obtained from 35 gauging stations at five alluvial river basins in South Korea were used together with the modified Einstein procedure and series expansion of the modified Einstein procedure to determine the total sediment load at the sampling locations. Using two different methods, the total sediment load of majority of the 35 considered rivers were found to be typically 50–300 ton/km2·yr. A model tree data mining technique was used to develop a model for estimating the specific degradation based on certain meaningful parameters, namely, the 1) elevation at the middle relative area of the hypsometric curve m, 2) percentage of wetland and water, 3) percentage of urban land, 4) mean annual precipitation mm, 5) main stream length km, and 6) watershed form factor km2/km2. The root mean square error of the predictions of the proposed model was found to be 55 ton/km2·yr less than those of existing statistical models. Erosion loss maps obtained by the revised universal soil loss equation (RUSLE), satellite images, and aerial photographs were also used to represent the geospatial features affecting erosion and sedimentation. The results of the geospatial analysis indicated that the transport of sediment into the alluvial rivers was affected by the wetlands located near the rivers, and also enabled clear delineation of the unique erosion features of construction sites in the urban areas. In addition, the watershed morphometric characteristics could be used to accurately represent the sediment transport. The proposed data mining methodology promises to facilitate the solution of various erosion and sedimentation problems in South Korea. The geospatial analysis procedure would also enable the understanding of spatially varied erosion and sedimentation processes under different conditions.
Display omitted
•Hoeffding Tree is a promising landslide susceptibility model based on ROC and AUC.•Bayes Network and Logistic Model Tree were applied for comparison.•Correlation analysis of ...conditioning factors was completed by Frequency Ratio.•Landslide susceptibility maps of Muchuan County and the county town were produced.
Landslides, one of the most common hazards around the world, have brought about severe damage to life and property of human. To prevent and mitigate landslides, various models have been introduced to assess landslide susceptibility. In this paper, Hoeffding Tree (HT), a prevailing data stream mining algorithm, was employed to predict landslide susceptibility in Muchuan County, China for the first time. Meanwhile, Logistic Model Tree (LMT) and Bayes Network (BN) were applied to produce landslide susceptibility maps for comparison. The model performances were evaluated by Receiver Operating Characteristic (ROC) curves and areas under the curves (AUC). To obtain landslide inventory map, 279 landslides data was collected, and training and validation datasets were randomly divided with a proportion of 70% to 30%. Furthermore, twelve conditioning factors (altitude, slope angle, profile curvature, plan curvature, slope aspect, distance to roads, distance to rivers, TWI, NDVI, soil, land use and lithology) were selected to construct landslide susceptibility models. Moreover, correlations between conditioning factors and landslides were analyzed using Frequency Ratio (FR). The results showed landslides are prone to occur in areas where human activities concentrate, and all three models exhibited satisfying performances. Concretely, for training dataset, LMT model showed the highest AUC (0.854), followed by HT (0.726) and BN (0.709). However, for validation dataset, LMT and BN models generated similar AUC values (0.761 and 0.764 respectively), and the highest AUC value belonged to HT (0.802). The distributions of landslide susceptibility zones revealed that the interior of county town is mainly seated in low and very low susceptibility zones, whereas regions close to the border suffer high and very high landslide risk. The results acquired in this paper are significant to landslide prevention and urban planning in Muchuan, China. Additionally, this study proved that HT model is a promising classifier for landslide susceptibility modeling.
Multi-output regression refers to the simultaneous prediction of several real-valued output variables to improve generalization performance by exploiting output relatedness. We propose a multi-output ...model tree that utilizes a regularization-based method to exploit the output relatedness when estimating linear models at leaf nodes. The proposed method can explain nonlinear input–output relation and provides easy interpretation of its mechanism based on input space partitioning and models at leaf nodes. The models exploit output relatedness by selecting common input variables to explain related output variables. We also present a computationally efficient two-stage splitting procedure that decreases the number of model estimations by analyzing residuals. We verify the effectiveness of the proposed method in a simulation study and demonstrate that it outperforms existing methods on several benchmark datasets. Furthermore, we apply the proposed method to real industry data as a case study to predict tensile qualities of plates.
Decision trees (DTs) are popular classifiers partly due to their reasonably good classification performance, their ease of interpretation, and their widespread use in ensembles. To improve the ...classification performance of individual DTs, researchers have used linear combinations of features in inner nodes (Multivariate decision trees), leaf nodes (Model trees), or both (Functional trees). In this paper, we present a new functional tree, Functional Tree for class imbalance problems (FT4cip). FT4cip is designed to work with class imbalance problems, where one of the classes in the database has few objects compared to another class. FT4cip achieves better classification performance, in terms of AUC, than the best model tree (LMT) and functional tree (Gama) that we identified. The statistical comparison was made in 110 databases using Bayesian statistical tests. We also make a meta-analysis of classification performance per type of database, which helps us recommend a classifier given a problem. We show how each design decision taken when building FT4cip contributes to classification performance or simple models, and rank them according to their importance to classification performance. To avoid a problem of fragmentation in DT literature, we contrast each design decision taken when building FT4cip against LMT and Gama.
•We introduce the Functional Tree for class imbalance problems (FT4cip).•We make a statistical comparison of FT4cip against rival methods in 110 databases.•The comparison shows FT4cip has better classification performance than rivals.•A meta-analysis lets us recommend what classifier to use given a specific problem.•The meta-analysis shows FT4cip has great performance in class imbalance problems.
Traffic prediction is a critical task for intelligent transportation systems (ITS). Prediction at intersections is challenging as it involves various participants, such as vehicles, cyclists, and ...pedestrians. In this paper, we propose a novel approach for the accurate intersection traffic prediction by introducing extra data sources other than road traffic volume data into the prediction model. In particular, we take advantage of the data collected from the reports of road accidents and roadworks happening near the intersections. In addition, we investigate two types of learning schemes, namely batch learning and online learning. Three popular ensemble decision tree models are used in the batch learning scheme, including Gradient Boosting Regression Trees (GBRT), Random Forest (RF) and Extreme Gradient Boosting Trees (XGBoost), while the Fast Incremental Model Trees with Drift Detection (FIMT-DD) model is adopted for the online learning scheme. The proposed approach is evaluated using public data sets released by the Victorian Government of Australia. The results indicate that the accuracy of intersection traffic prediction can be improved by incorporating nearby accidents and roadworks information.