The Gaussian process is an indispensable tool for spatial data analysts. The onset of the “big data” era, however, has lead to the traditional Gaussian process being computationally infeasible for ...modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics.
Vecchia's approximate likelihood for Gaussian process parameters depends on how the observations are ordered, which has been cited as a deficiency. This article takes the alternative standpoint that ...the ordering can be tuned to sharpen the approximations. Indeed, the first part of the article includes a systematic study of how ordering affects the accuracy of Vecchia's approximation. We demonstrate the surprising result that random orderings can give dramatically sharper approximations than default coordinate-based orderings. Additional ordering schemes are described and analyzed numerically, including orderings capable of improving on random orderings. The second contribution of this article is a new automatic method for grouping calculations of components of the approximation. The grouping methods simultaneously improve approximation accuracy and reduce computational burden. In common settings, reordering combined with grouping reduces Kullback-Leibler divergence from the target model by more than a factor of 60 compared to ungrouped approximations with default ordering. The claims are supported by theory and numerical results with comparisons to other approximations, including tapered covariances and stochastic partial differential equations. Computational details are provided, including the use of the approximations for prediction and conditional simulation. An application to space-time satellite data is presented.
We derive a single-pass algorithm for computing the gradient and Fisher information of Vecchia’s Gaussian process loglikelihood approximation, which provides a computationally efficient means for ...applying the Fisher scoring algorithm for maximizing the loglikelihood. The advantages of the optimization techniques are demonstrated in numerical examples and in an application to Argo ocean temperature data. The new methods find the maximum likelihood estimates much faster and more reliably than an optimization method that uses only function evaluations, especially when the covariance function has many parameters. This allows practitioners to fit nonstationary models to large spatial and spatial–temporal datasets.
We describe our implementation of the multivariate Matern model for multivariate spatial datasets, using Vecchia's approximation and a Fisher scoring optimization algorithm. We consider various ...pararameterizations for the multivariate Matern that have been proposed in the literature for ensuring model validity, as well as an unconstrained model. A strength of our study is that the code is tested on many real-world multivariate spatial datasets. We use it to study the effect of ordering and conditioning in Vecchia's approximation and the restrictions imposed by the various parameterizations. We also consider a model in which co-located nuggets are correlated across components and find that forcing this cross-component nugget correlation to be zero can have a serious impact on the other model parameters, so we suggest allowing cross-component correlation in co-located nugget terms.
Display omitted
•Short-term reactivity of arsenate As(V) in different soil microsites was assessed.•Arsenate binding did not solely depend on Fe and Al (hydr)oxides.•Solids containing Mn, Zn, Ti, and ...Cu can potentially enhance As(V) retention.•Variations in As(V) retention across soils could not be ascribed to pedogenic effects.•Microscale heterogeneity affects As retention regardless of the pedogenic environment.
Determining reaction mechanisms that control the mobility of nutrients and toxic elements in soil matrices is confounded by complex assemblages of minerals, non-crystalline solids, organic matter, and biota. Our objective was to infer the chemical elements and solids that contribute to As binding in matrices of soil samples from different pedogenic environments at the micrometer spatial scale. Arsenic was reacted with and imaged in thin weathering coatings on eight quartz sand grains separated from soils of different drainage classes to vary contents of Fe and Al (hydr)oxides, organic carbon (OC), and other elements. The grains were analyzed using X-ray fluorescence microprobe (µ-XRF) imaging and microscale X-ray absorption near edge structure (μ-XANES) spectroscopy before and after treatment with 0.1 mM As(V) solution. Partial correlation analyses and regression models developed from multi-element µ-XRF signals collected across 100 × 100 µm2 areas of sand-grain coatings inferred augmenting effects of Fe, Zn, Ti, Mn, or Cu on As retention. Significant partial correlations (r′ > 0.11) between Fe and Al from time-of-flight secondary ion mass spectrometry (TOF-SIMS) analysis of most samples suggested that Fe and Al (hydr)oxides were partially co-localized at the microscale. Linear combination fitting (LCF) results for As K-edge μ-XANES spectra collected across grain coatings typically included >80% of As(V) adsorbed on goethite, along with varying proportions of standards of As(V) adsorbed on boehmite, As(V) or As(III) bound to Fe(III)-treated peat, and dimethylarsinic acid. Complementary fits for Fe K-edge μ-XANES spectra included ≥50% of the Fe(III)-treated peat standard for all samples, along with goethite. Our collective results inferred a dominance of Fe and possibly Al (hydr)oxides in controlling As immobilization, with variable contributions from Zn, Ti, Cu, or Mn, both across the coating of a single sand grain and between grains from soils developed under different pedogenic environments. Overall, these results highlight the extreme heterogeneity of soils on the microscale and have implications on soil management for mitigating the adverse environmental impacts of As.
Summary
We introduce methods for estimating the spectral density of a random field on a $d$-dimensional lattice from incomplete gridded data. Data are iteratively imputed onto an expanded lattice ...according to a model with a periodic covariance function. The imputations are convenient computationally, in that circulant embedding and preconditioned conjugate gradient methods can produce imputations in $O(n\log n)$ time and $O(n)$ memory. However, these so-called periodic imputations are motivated mainly by their ability to produce accurate spectral density estimates. In addition, we introduce a parametric filtering method that is designed to reduce periodogram smoothing bias. The paper contains theoretical results on properties of the imputed-data periodogram and numerical and simulation studies comparing the performance of the proposed methods to existing approaches in a number of scenarios. We present an application to a gridded satellite surface temperature dataset with missing values.
Summary
We conduct a study of the aliased spectral densities of Matérn covariance functions on a regular grid of points, elucidating the properties of a popular approximation based on stochastic ...partial differential equations. While other researchers have shown that this approximation can work well for the covariance function, we find that it assigns too much power at high frequencies and does not provide increasingly accurate approximations to the inverse as the grid spacing goes to zero, except in the one-dimensional exponential covariance case.
We propose computationally efficient methods for estimating stationary multivariate spatial and spatial–temporal spectra from incomplete gridded data. The methods are iterative and rely on successive ...imputation of data and updating of model estimates. Imputations are done according to a periodic model on an expanded domain. The periodicity of the imputations is a key feature that reduces edge effects in the periodogram and is facilitated by efficient circulant embedding techniques. In addition, we describe efficient methods for decomposing the estimated cross spectral density function into a linear model of coregionalization plus a residual process. The methods are applied to two storm datasets, one of which is from Hurricane Florence, which struck the southeastern United States in September 2018. The application demonstrates how fitted models from different datasets can be compared, and how the methods are computationally feasible on datasets with more than 200,000 total observations.