Pedology and digital soil mapping (DSM) Ma, Yuxin; Minasny, Budiman; Malone, Brendan P. ...
European journal of soil science,
March 2019, 2019-03-00, 20190301, Letnik:
70, Številka:
2
Journal Article
Recenzirano
Pedology focuses on understanding soil genesis in the field and includes soil classification and mapping. Digital soil mapping (DSM) has evolved from traditional soil classification and mapping to ...the creation and population of spatial soil information systems by using field and laboratory observations coupled with environmental covariates. Pedological knowledge of soil distribution and processes can be useful for digital soil mapping. Conversely, digital soil mapping can bring new insights to pedogenesis, detailed information on vertical and lateral soil variation, and can generate research questions that were not considered in traditional pedology. This review highlights the relevance and synergy of pedology in soil spatial prediction through the expansion of pedological knowledge. We also discuss how DSM can support further advances in pedology through improved representation of spatial soil information. Some major findings of this review are as follows: (a) soil classes can be mapped accurately using DSM, (b) the occurrence and thickness of soil horizons, whole soil profiles and soil parent material can be predicted successfully with DSM techniques, (c) DSM can provide valuable information on pedogenic processes (e.g. addition, removal, transformation and translocation), (d) pedological knowledge can be incorporated into DSM, but DSM can also lead to the discovery of knowledge, and (e) there is the potential to use process‐based soil–landscape evolution modelling in DSM. Based on these findings, the combination of data‐driven and knowledge‐based methods promotes even greater interactions between pedology and DSM.
Highlights
Demonstrates relevance and synergy of pedology in soil spatial prediction, and links pedology and DSM.
Indicates the successful application of DSM in mapping soil classes, profiles, pedological features and processes.
Shows how DSM can help in forming new hypotheses and gaining new insights about soil and soil processes.
Combination of data‐driven and knowledge‐based methods recommended to promote greater interactions between DSM and pedology.
•Empirical modelling is used for mapping soil thickness across Australia at 3 arc second grid cell resolution.•Soil thickness estimates are informed from the data mining of three separate site ...datasets.•Our model accommodates right-censored data.•Our data mining model derives plausible realities of soil thickness.•From the modelling, one can derive exceedance probabilities of soil thickness given define depth thresholds.
Soil thickness is not easily measured in situ, making it also a challenging variable to reliably map. This study improves on previous digital mapping of soil thickness across Australia using an approach suited to the continent’s unique pedo-geomorphic history. Leveraging three large, in situ observation datasets and a wide range of spatial environmental variables, we developed three models depicting rock outcrops, intermediate and deep soils respectively. Our modelling approach addressed right-censored data, which is a common attribute of soil thickness data, and we applied an iterative, data re-sampling framework to quantify prediction uncertainties. We integrated the three models to create soil thickness maps and associated products of soil thickness exceedance probabilities. Using data excluded from model calibrations, we achieved an overall accuracy of 99% for the binary outcome rock outcrops model, and 85% for the binary outcome deep soils model. Modelling soil thickness of shallow to deep soils resulted in a concordance coefficient of 0.77. Of all the environmental variables considered in this study, those associated with climate data (including topo-climate) were consistently the most often used and important. We associate this finding with the direct and indirect effects of climate on biota and weathering of parental materials along with other factors driving spatial heterogeneity in soil thickness across Australia. While the products generated by this research are not without error, the overall pattern of soil thickness is consistent with previous observations from historical soil surveys across Australia and the results are demonstrably more skilful than previous digital soil mapping efforts.
Digital Mapping of Soil Carbon Minasny, Budiman; McBratney, Alex B; Malone, Brendan P ...
Advances in agronomy,
2013, Letnik:
118
Journal Article
Recenzirano
There is a global demand for soil data and information for food security and global environmental management. There is also great interest in recognizing the soil system as a significant terrestrial ...sink of carbon. The reliable assessment of soil carbon (C) stocks is of key importance for soil conservation and in mitigation strategies for increased atmospheric carbon. In this article, we review and discuss the recent advances in digital mapping of soil C. The challenge to map carbon is demonstrated with the large variation of soil C concentration at a field, continental, and global scale. This article reviews recent studies in mapping soil C using digital soil mapping approaches. The general activities in digital soil mapping involve collection of a database of soil carbon observations over the area of interest; compilation of relevant covariates (scorpan factors) for the area; calibration or training of a spatial prediction function based on the observed dataset; interpolation and/or extrapolation of the prediction function over the whole area; and finally validation using existing or independent datasets. We discuss several relevant aspects in digital mapping: carbon concentration and carbon density, source of data, sampling density and resolution, depth of investigation, map validation, map uncertainty, and environmental covariates. We demonstrate harmonization of soil depths using the equal-area spline and the use of a material coordinate system to take into consideration the varying bulk density due to management practices. Soil C mapping has evolved from 2-D mapping of soil C stock at particular depth ranges to a semi-3-D soil map allowing the estimation of continuous soil C concentration or density with depth. This review then discusses the dynamics of soil C and the consequences for prediction and mapping of soil C change. Finally, we illustrate the prediction of soil carbon change using a semidynamic scorpan approach.
Understanding the uncertainty in spatial modelling of environmental variables is important because it provides the end-users with the reliability of the maps. Over the past decades, Bayesian ...statistics has been successfully used. However, the conventional simulation-based Markov Chain Monte Carlo (MCMC) approaches are often computationally intensive. In this study, the performance of a novel Bayesian inference approach called Integrated Nested Laplace Approximation with Stochastic Partial Differential Equation (INLA-SPDE) was evaluated using independent calibration and validation datasets of various skewed and non-skewed soil properties and was compared with a linear mixed model estimated by residual maximum likelihood (REML-LMM). It was found that INLA-SPDE was equivalent to REML-LMM in terms of the model performance and was similarly robust with sparse datasets (i.e. 40–60 samples). In comparison, INLA-SPDE was able to estimate the posterior marginal distributions of the model parameters without extensive simulations. It was concluded that INLA-SPDE had the potential to map the spatial distribution of environmental variables along with their posterior marginal distributions for environmental management. Some drawbacks were identified with INLA-SPDE, including artefacts of model response due to the use of triangle meshes and a longer computational time when dealing with non-Gaussian likelihood families.
Display omitted
•INLA-SPDE used to predict skewed and non-skewed environmental variables.•The model performance of INLA-SPDE was equivalent to REML-LMM.•INLA-SPDE was able to estimate the pdfs of model parameters and responses.•INLA-SPDE was as robust as REML-LMM with sparse datasets (e.g. 40–60).•INLA-SPDE can be applied in environmental monitoring and management.
It is widely acknowledged that the global stock of soil and environmental resources are diminishing and under threat. This issue stems from current and historical unsustainable management practices, ...leading to degraded landscapes, which is further compounded by increased pressures upon them from ever-increasing anthropogenic activities. To curb the trajectory toward a collapse of our ecosystems, systematic ways are needed to assess the condition of our natural resources, how much they might have changed, and to what extent this might impact on the life sustaining functions we derive from our environment and the extent of our food producing systems. Some solutions to these issues come in the form of measurement, mapping and monitoring technology, which facilitates powerful ways in which to be informed about and to understand and assess the condition of our landscapes so that they can be managed strategically or simply improved. This Special Issue showcases from several locations across the globe, detailed examples of what is achievable at the convergence of big data brought about by remote and proximal sensing platforms, advanced statistical modelling and computing infrastructure to understand and monitor our ecosystems better. These utilities not only provide high-resolution abilities to map the extent and changes to our food producing systems, they also have yielded new ways to determine land-use and climate effects on the fate of soil carbon across living generations and to identify hydrological risk strategies in otherwise data-poor urban environments. Leveraging the availability of remote sensing data is telling, but the papers in this Special Issue also highlight the sophistication of modelling capabilities to deliver not only highly detailed maps of temporal dynamic soil phenomena but ways to draw new inferences from sparse and disparate model input data. The challenges of restoring our ecosystems are immense and sobering. However, we are well equipped and capable of confronting these pervasive issues in objective and data-informed ways that have previously never been possible.
The use of visible-near infrared (vis-NIR) spectroscopy for rapid soil characterisation has gained a lot of interest in recent times. Soil spectra absorbance from the visible-infrared range can be ...calibrated using regression models to predict a set of soil properties. The accuracy of these regression models relies heavily on the calibration set. The optimum sample size and the overall sample representativeness of the dataset could further improve the model performance. However, there is no guideline on which sampling method should be used under different size of datasets.
Here, we show different sampling algorithms performed differently under different data size and different regression models (Cubist regression tree and Partial Least Square Regression (PLSR)). We analysed the effect of three sampling algorithms: Kennard-Stone (KS), conditioned Latin Hypercube Sampling (cLHS) and k-means clustering (KM) against random sampling on the prediction of up to five different soil properties (sand, clay, carbon content, cation exchange capacity and pH) on three datasets. These datasets have different coverages: a European continental dataset (LUCAS,
= 5,639), a regional dataset from Australia (Geeves,
= 379), and a local dataset from New South Wales, Australia (Hillston,
= 384). Calibration sample sizes ranging from 50 to 3,000 were derived and tested for the continental dataset; and from 50 to 200 samples for the regional and local datasets.
Overall, the PLSR gives a better prediction in comparison to the Cubist model for the prediction of various soil properties. It is also less prone to the choice of sampling algorithm. The KM algorithm is more representative in the larger dataset up to a certain calibration sample size. The KS algorithm appears to be more efficient (as compared to random sampling) in small datasets; however, the prediction performance varied a lot between soil properties. The cLHS sampling algorithm is the most robust sampling method for multiple soil properties regardless of the sample size.
Our results suggested that the optimum calibration sample size relied on how much generalization the model had to create. The use of the sampling algorithm is beneficial for larger datasets than smaller datasets where only small improvements can be made. KM is suitable for large datasets, KS is efficient in small datasets but results can be variable, while cLHS is less affected by sample size.
The conditioned Latin hypercube sampling (cLHS) algorithm is popularly used for planning field sampling surveys in order to understand the spatial behavior of natural phenomena such as soils. This ...technical note collates, summarizes, and extends existing solutions to problems that field scientists face when using cLHS. These problems include optimizing the sample size, re-locating sites when an original site is deemed inaccessible, and how to account for existing sample data, so that under-sampled areas can be prioritized for sampling. These solutions, which we also share as individual R scripts, will facilitate much wider application of what has been a very useful sampling algorithm for scientific investigation of soil spatial variation.
Soil texture which is spatially variable in nature, is an important soil physical property that governs most physical, chemical, biological, and hydrological processes in soils. Detailed information ...on soil texture variability both in vertical and lateral dimensions is crucial for proper crop and land management and environmental studies, especially in Denmark where mechanized agriculture covers two thirds of the land area. We modeled the continuous depth function of texture distribution from 1958 Danish soil profiles (up to a 2‐m depth) using equal‐area quadratic splines and predicted clay, silt, fine sand, and coarse sand content at six standard soil depths of GlobalSoilMap project (0–5, 5–15, 15–30, 30–60, 60–100, and 100–200 cm) via regression rules using the Cubist data mining tool. Seventeen environmental variables were used as predictors and their strength of prediction was also calculated. For example, in the prediction of silt content at 0 to 5 cm depth, factors that registered a higher level of importance included the soil map scored (90%), landscape types (54%), and landuse (27%), while factors with lower scores were direct insolation (17%) and slope aspect (14%). Model validation (20% of the data selected randomly) showed a higher prediction performance in the upper depth intervals but increasing prediction error in the lower depth intervals (e.g., R2 = 0.54, RMSE = 33.7 g kg−1 for silt 0–5 cm and R2 = 0.29, RMSE = 38.8 g kg−1 from 100–200 cm). Danish soils have a high sand content (mean values for clay, silt, fine sand, and coarse sand content for 0‐ to 5‐cm depth were 79, 84, 324, and 316 g kg−1, respectively). Northern parts of the country have a higher content of fine sand compared to the rest of the study area, whereas in the western part of the country there was little clay but a high coarse sand content at all soil depths. The eastern and central parts of the country are rich in clay, but due to leaching, surface soils are clay eluviated with subsequent accumulation at lower depths. We found equal‐area quadratic splines and regression rules to be promising tools for soil profile harmonization and spatial prediction of texture properties at national extentacross Denmark.
•Digital soil maps of key soil constraints were developed at 20 m resolution.•Yield was modelled using soil constraints and terrain infrastructure variables.•Interpretive machine learning revealed ...the spatial drivers of yield variability.•The impacts of constraints on yield were quantified in bales per hectare.
Machine learning approaches have been widely used for crop yield modelling and yield forecasting but there has been limited application to understanding site-specific yield constraints. Crop yield is driven by a complex interaction of spatial and temporal variables, which makes it challenging to define the exact cause of observed spatial yield variability explicitly. This makes it difficult to design efficient management strategies to address production constraints. There is a need for a more quantitative and systematic approach to identify and understand the causes of variation in crop yield in order to implement appropriate management responses. This study investigated the use of interpretive machine learning (IML) to address this need. The developed methodology was demonstrated on furrow-irrigated cotton fields totalling ∼2000 ha in the Condamine-Balonne River catchment, Australia. Digital soil maps of important soil constraints were created at 20 m spatial resolution using 70 soil cores extracted to 1.4 m depth and a combination of on-farm and off-farm spatial data layers. Specifically, the soil constraints represented were exchangeable sodium percentage (ESP – sodicity), pH (alkalinity), and electrical conductivity (ECe – salinity). Terrain infrastructure variable maps of closed depressions, distance down furrow, and cut and fill (from landforming practices) were also developed. Empirical models of cotton lint yield were created with gradient boosted decision trees (XGBoost) using the digital soil maps and terrain infrastructure data as predictor variables. The models could describe the spatial variation in yield well, with a median Lin’s concordance correlation coefficient of 0.67 and root-mean-square error of 0.75b ha−1. SHapley Additive exPlanations (SHAP), an IML approach based on game theory, was then used to identify the contribution of each variable to the modelled yield across the study area. The variable most decreasing yield at each point was identified and mapped across the study area, and the spatial extent represented by each variable quantified. The SHAP values for each predictor variable were also extracted and mapped for a case study field, which demonstrated the magnitude of the impact of each variable on yield with spatial context in easily interpretable units (b ha−1). The presented methodology is promising for cost-benefit analysis of implementing remediation strategies, or where not economically feasible, altering management inputs according to a constrained yield potential.
•Informative vectors are calculated for each wavelength variable.•Ordered Predictor Selection (OPS) ranks the importance of the variables.•Exponentially decreasing function (EDF) shrinks the number ...of variables.•Informative vectors + OPS + EDF are used to create subset models.•Subset models can provide similar predictions to the full spectra models with a lower computational cost.
Infrared spectroscopy has been widely adopted by various agricultural research. The typical spectra variables contain thousands of wavelengths. These large number of spectra variables often contribute to collinearity, and redundancies rather than relevant information. Variable selection of the predictors is an important step to create a robust calibration model from these spectra data. This paper presents an algorithm for spectra variable selection based on a combination of informative vectors and an ordered predictor selection (OPS) approach with an exponentially decreasing function (EDF) selection. Informative vectors are features derived from statistical principles that can be used to describe the relationship between the dependent variables and the predictors (spectra). The informative vectors analysed include regression coefficient vector (b), variable influence on projection (V), residual vector (S), net analyte signal vector (Na), linear correlation vector (COR), biweight mid-correlation vector (BIC), mutual information based on adjacency matrix (AMI), covariance procedures matrix (COV). These eight informative vectors can be joined in pairs and become 22 combination vectors. This approach was tested with near-infrared soil spectra for predicting the properties of pH, clay and sand content, cation exchange capacity (CEC), and total carbon content. This example used the Cubist regression tree and partial least squares regression (PLSR) models for calibration. By utilizing the subset of the spectra (retaining those that are significant based on the absolute values of the informative vectors), the regression models were still able to enhance the prediction capability. Overall, the PLSR model performed better than the Cubist model. The informative vector b (and its combinations) and S (and its combinations) were found to be the ones that provide the most accurate predictions for this dataset. Although the performance of the subset model does not perform better than the full spectra model, the number of wavelengths variable used in the model is significantly reduced to, on average, 25%.