This paper is a methodological guide to using machine learning in the spatial context. It provides an overview of the existing spatial toolbox proposed in the literature: unsupervised learning, which ...deals with clustering of spatial data, and supervised learning, which displaces classical spatial econometrics. It shows the potential of using this developing methodology, as well as its pitfalls. It catalogues and comments on the usage of spatial clustering methods (for locations and values, both separately and jointly) for mapping, bootstrapping, cross-validation, GWR modelling and density indicators. It provides details of spatial machine learning models, which are combined with spatial data integration, modelling, model fine-tuning and predictions to deal with spatial autocorrelation and big data. The paper delineates “already available” and “forthcoming” methods and gives inspiration for transplanting modern quantitative methods from other thematic areas to research in regional science.
This textbook is a comprehensive introduction to applied spatial data analysis using R. Each chapter walks the reader through a different method, explaining how to interpret the results and what ...conclusions can be drawn. The author team showcases key topics, including unsupervised learning, causal inference, spatial weight matrices, spatial econometrics, heterogeneity and bootstrapping. It is accompanied by a suite of data and R code on Github to help readers practise techniques via replication and exercises.
This text will be a valuable resource for advanced students of econometrics, spatial planning and regional science. It will also be suitable for researchers and data scientists working with spatial data.
Benford's law states that the first digits of numbers in any natural dataset appear with defined frequencies. Pioneering, we use Benford distribution to analyse the geo-location of cities and their ...population in the majority of countries. We use distances in three dimensions: 1D between the population values, 2D between the cities, based on geo-coordinates of location, 3D between cities' location and population, which jointly reflects separation and mass of urban locations. We get four main findings. Firstly, we empirically show that mutual 3D socio-geo distances between cities and populations in most countries conform with Benford's law, and thus the urban geo-locations have natural spatial distribution. Secondly, we show empirically that the population of cities within countries follows the composition of gamma (1,1) distributions and that 1D distance between populations also conforms to Benford's law. Thirdly, we pioneer in replicating spatial natural distribution-we discover in simulation that a mixture of three pure point-patterns: clustered, ordered and random in proportions 15:3:2 makes the 2D spatial distribution Benford-like. Complex 3D Benford-like patterns can be built upon 2D (spatial) Benford distribution and gamma (1,1) distribution of cities' sizes. This finding enables generating 2D and 3D Benford distributions, which may replicate well the urban settlement. Fourth, we use historical settlement analysis to claim that the geo-location of cities and inhabitants worldwide followed the evolutionary process, resulting in natural Benford-like spatial distribution and to justify our statistical findings. Those results are very novel. This study develops new spatial distribution to simulate natural locations. It shows that evolutionary settlement patterns resulted in the natural location of cities, and historical distortions in urbanisation, even if persistent till now, are being evolutionary corrected.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Accessibility of transport infrastructure, commercial amenities, recreational facilities, and green spaces is widely recognised as crucial to the well-being of urban residents. However, these ...features are often unevenly distributed across the geographical boundaries of a city, leading to disparities in the local quality of life. This study focuses on the city of Warsaw, Poland, and uses the aforementioned characteristics and the framework of the '15-min city' concept to construct a grid-level urban Quality of Life Index (QOLI) that facilitates comparisons between the city's districts and local neighbourhoods. The results of our study reveal a "high-inside, low-outside" pattern of quality of life, characterised by higher standards of living in the central districts and lower standards at the city's periphery.
This study is a comprehensive and modern approach to predict customer churn in the example of an e-commerce retail store operating in Brazil. Our approach consists of three stages in which we combine ...and use three different datasets: numerical data on orders, textual after-purchase reviews and socio-geo-demographic data from the census. At the pre-processing stage, we find topics from text reviews using Latent Dirichlet Allocation, Dirichlet Multinomial Mixture and Gibbs sampling. In the spatial analysis, we apply DBSCAN to get rural/urban locations and analyse neighbourhoods of customers located with zip codes. At the modelling stage, we apply machine learning extreme gradient boosting and logistic regression. The quality of models is verified with area-under-curve and lift metrics. Explainable artificial intelligence represented with a permutation-based variable importance and a partial dependence profile help to discover the determinants of churn. We show that customers’ propensity to churn depends on: (i) payment value for the first order, number of items bought and shipping cost; (ii) categories of the products bought; (iii) demographic environment of the customer; and (iv) customer location. At the same time, customers’ propensity to churn is not influenced by: (i) population density in the customer’s area and division into rural and urban areas; (ii) quantitative review of the first purchase; and (iii) qualitative review summarised as a topic.
There have been numerous advances in financial time series forecasting in recent years. Most of them use deep learning techniques. We identified 15 outstanding papers that have been published in the ...last seven years and have tried to prove the superiority of their approach to forecasting one-dimensional financial time series using deep learning techniques. In order to objectively compare these approaches, we analysed the proposed statistical models and then reviewed and reproduced them. The models were trained to predict, one day in advance, the value of 29 indices and the stock and commodity prices over five different time periods (from 2007 to 2022), with 4 in-sample years and 1 out-of-sample year. Our findings indicated that, first of all, most of these approaches do not beat the naive approach, and only some barely beat it. Most of the researchers did not provide enough data necessary to fully replicate the approach, not to mention the codes. We provide a set of practical recommendations of when to use which models based on the data sample that we provide.
Spatial econometric models estimated on the big geo‐located point data have at least two problems: limited computational capabilities and inefficient forecasting for the new out‐of‐sample geo‐points. ...This is because of spatial weights matrix W defined for in‐sample observations only and the computational complexity. Machine learning models suffer the same when using kriging for predictions; thus this problem still remains unsolved. The paper presents a novel methodology for estimating spatial models on big data and predicting in new locations. The approach uses bootstrap and tessellation to calibrate both model and space. The best bootstrapped model is selected with the PAM (Partitioning Around Medoids) algorithm by classifying the regression coefficients jointly in a nonindependent manner. Voronoi polygons for the geo‐points used in the best model allow for a representative space division. New out‐of‐sample points are assigned to tessellation tiles and linked to the spatial weights matrix as a replacement for an original point what makes feasible usage of calibrated spatial models as a forecasting tool for new locations. There is no trade‐off between forecast quality and computational efficiency in this approach. An empirical example illustrates a model for business locations and firms' profitability.
The paper combines theoretical models of housing and business locations and shows that they have the same determinants. It evidences that classical, behavioural, new economic geography, evolutionary ...and co-evolutionary frameworks apply simultaneously, and one should consider them jointly when explaining urban structure. We use quantitative tools in a theory-guided factors induction approach to show the complexity of location models. The paper discusses and measures spatial phenomena as distance-decaying gradients, spatial discontinuities, densities, spillovers, spatial interactions, agglomerations, and as multimodal processes. We illustrate the theoretical discussion with an empirical case of interacting point-patterns for business, housing, and population. The analysis reveals strong links between housing valuation and business location and profitability, accompanied by the related spatial phenomena. It also shows that assumptions concerning unimodal spatial urban structure, the existence of rational maximisers, distance-decaying externalities, and a single pattern of behaviour, do not hold. Instead, the reality entails consideration of multimodality, a mixture of maximisers and satisfiers, incomplete information, appearance of spatial interactions, feed-back loops, as well as the existence of persistence of behaviour, with slow and costly adjustments of location.
An understanding of the microstructure of geomaterials such as rocks is fundamental in the evaluation of their functional properties, as well as the decryption of their geological history. We present ...a semi-automated statistical protocol for a complex 3D characterization of the microstructure of granular materials, including the clustering of grains and a description of their chemical composition, size, shape, and spatial properties with 44 unique parameters. The approach consists of an X-ray microtomographic image processing procedure, followed by measurements using image analysis and statistical multivariate analysis of its results utilizing freeware and widely available software. The statistical approach proposed was tested out on a sandstone sample with hidden and localized deformational microstructures. The grains were clustered into distinctive groups covering different compositional and geometrical features of the sample’s granular framework. The grains are pervasively and evenly distributed within the analysed sample. The spatial arrangement of grains in particular clusters is well organized and shows a directional trend referring to both microstructures. The methodological approach can be applied to any other rock type and enables the tracking of microstructural trends in grains arrangement.
This paper proposes a methodology for measuring the spatial effects of roads and the seats of local authorities on the diffusion of business activity, which usually follows distance decay patterns ...from core to periphery. Regional development policies, pursued by regional authorities, directed at local units and designed to support local economies, are implemented by means of a centrifugal diffusion process. This invisible flow of policy is modeled using a one-way spatial interaction model represented by a multinomial distance decay function for the integrated spatial dataset. The research results indicate that NUTS5 (Nomenclature of Territorial Units for Statistics) units (gminas) perform better in terms of saturation with business activity when NUTS4 seats of authority are established there than when they are established near international roads. The natural diffusion process from core cities to the periphery covers approximately 25-30 km, and the presence of international roads extends this range by 20 km. The results confirm the hypothesis of an endogenous growth pattern.