High dimensional data, large-scale data, imaging and manifold data are all fostering new frontiers of statistics. These type of data are commonly considered in Functional Data Analysis where they are ...viewed as infinite-dimensional random vectors in a functional space. The rapid development of new technologies has generated a flow of complex data that have led to the development of new modeling strategies by scientists. In this paper, we basically deal with the problem of clustering a set of complex functional data into homogeneous groups. Working in a mixture model-based framework, we develop a flexible clustering technique achieving dimensionality reduction schemes through an
L
1
penalization. The proposed procedure results in an integrated modelling approach where shrinkage techniques are applied to enable sparse solutions in both the means and the covariance matrices of the mixture components, while preserving the underlying clustering structure. This leads to an entirely data-driven methodology suitable for simultaneous dimensionality reduction and clustering. The proposed methodology is evaluated through a Monte Carlo simulation study and an empirical analysis of real-world datasets showing different degrees of complexity.
The air in the Lombardy region, Italy, is one of the most polluted in Europe because of limited air circulation and high emission levels. There is a large scientific consensus that the agricultural ...sector has a significant impact on air quality. To support studies quantifying the role of the agricultural and livestock sectors on the Lombardy air quality, this paper presents a harmonised dataset containing daily values of air quality, weather, emissions, livestock, and land and soil use in the years 2016-2021, for the Lombardy region. The daily scale is obtained by averaging hourly data and interpolating other variables. In fact, the pollutant data come from the European Environmental Agency and the Lombardy Regional Environment Protection Agency, weather and emissions data from the European Copernicus programme, livestock data from the Italian zootechnical registry, and land and soil use data from the CORINE Land Cover project. The resulting dataset is designed to be used as is by those using air quality data for research.
Vehicular traffic plays an important role in atmospheric pollution and can be used as one of the key predictors in air-quality forecasting models. The models that can account for the role of traffic ...are especially valuable in urban areas, where high pollutant concentrations are often observed during particular times of day (rush hour) and year (winter). In this paper, we develop a generalized additive models approach to analyze the behavior of concentrations of nitrogen dioxide (NO2), and particulate matter (PM10), collected at the environmental monitoring stations distributed throughout the city of Turin, Italy, from December 2003 to April 2005. We describe nonlinear relationships between predictors and pollutants, that are adjusted for unobserved time-varying confounders. We examine several functional forms for the traffic variable and find that a simple form can often provide adequate modeling power. Our analysis shows that there is a saturation effect of traffic on NO2, while such saturation is less evident in models linking traffic to PM10 behavior, having adjusted for meteorological covariates. Moreover, we consider the proposed models separately by seasons and highlight similarities and differences in the predictors’ partial effects. Finally, we show how forecasting can help in evaluating traffic regulation policies.
Functional data featured by a spatial dependence structure occur in many environmental sciences when curves are observed, for example, along time or along depth. Recently, some methods allowing for ...the prediction of a curve at an unmonitored site have been developed. However, the existing methods do not allow to include in a model exogenous variables that, for example, bring meteorology information in modeling air pollutant concentrations. In order to introduce exogenous variables, potentially observed as curves as well, we propose to extend the so-called kriging with external drift—or regression kriging—to the case of functional data by means of a three-step procedure involving functional modeling for the trend and spatial interpolation of functional residuals. A cross-validation analysis allows to choose smoothing parameters and a preferable kriging predictor for the functional residuals. Our case study considers daily PM₁₀concentrations measured from October 2005 to March 2006 by the monitoring network of Piemonte region (Italy), with the trend defined by meteorological time-varying covariates and orographical constant-in-time variables. The performance of the proposed methodology is evaluated by predicting PM₁₀concentration curves on 10 validation sites, even with simulated realistic datasets on a larger number of spatial sites. In this application the proposed methodology represents an alternative to spatio-temporal modeling but it can be applied more generally to spatially dependent functional data whose domain is not a time interval.
Abstract Spatial mapping of biodiversity is crucial to investigate spatial variations in natural communities. Several indices have been proposed in the literature to represent biodiversity as a ...single statistic. However, these indices only provide information on individual dimensions of biodiversity, thus failing to grasp its complexity comprehensively. Consequently, relying solely on these single indices can lead to misleading conclusions about the actual state of biodiversity. In this work, we focus on biodiversity profiles , which provide a more flexible framework to express biodiversity through nonnegative and convex curves, which can be analyzed by means of functional data analysis. By treating the whole curves as single entities, we propose to achieve a functional zoning of the region of interest by means of a penalized model‐based clustering procedure. This provides a spatial clustering of the biodiversity profiles, which is useful for policy‐makers both for conserving and managing natural resources and revealing patterns of interest. Our approach is evaluated using a simulation study and discussed through the analysis of the Harvard Forest Data , which provides information on the spatial distribution of woody stems within a plot of the Harvard Forest.
The increasing interest in spatially correlated functional data has led to the development of appropriate geostatistical techniques that allow to predict a curve at an unmonitored location using a ...functional kriging with external drift model that takes into account the effect of exogenous variables (either scalar or functional). Nevertheless uncertainty evaluation for functional spatial prediction remains an open issue. We propose a semi-parametric bootstrap for spatially correlated functional data that allows to evaluate the uncertainty of a predicted curve, ensuring that the spatial dependence structure is maintained in the bootstrap samples. The performance of the proposed methodology is assessed via a simulation study. Moreover, the approach is illustrated on a well known data set of Canadian temperature and on a real data set of PM10 concentration in the Piemonte region, Italy. Based on the results it can be concluded that the method is computationally feasible and suitable for quantifying the uncertainty around a predicted curve. Supplementary material including R code is available online.
This study presents a comparative analysis of three predictive models with an increasing degree of flexibility: hidden dynamic geostatistical models (HDGM), generalised additive mixed models (GAMM), ...and the random forest spatiotemporal kriging models (RFSTK). These models are evaluated for their effectiveness in predicting
PM
2.5
concentrations in Lombardy (North Italy) from 2016 to 2020. Despite differing methodologies, all models demonstrate proficient capture of spatiotemporal patterns within air pollution data with similar out-of-sample performance. Furthermore, the study delves into station-specific analyses, revealing variable model performance contingent on localised conditions. Model interpretation, facilitated by parametric coefficient analysis and partial dependence plots, unveils consistent associations between predictor variables and
PM
2.5
concentrations. Despite nuanced variations in modelling spatiotemporal correlations, all models effectively accounted for the underlying dependence. In summary, this study underscores the efficacy of conventional techniques in modelling correlated spatiotemporal data, concurrently highlighting the complementary potential of Machine Learning and classical statistical approaches.
Abstract This study presents a comparative analysis of three predictive models with an increasing degree of flexibility: hidden dynamic geostatistical models (HDGM), generalised additive mixed models ...(GAMM), and the random forest spatiotemporal kriging models (RFSTK). These models are evaluated for their effectiveness in predicting $$\text {PM}_{2.5}$$ PM 2.5 concentrations in Lombardy (North Italy) from 2016 to 2020. Despite differing methodologies, all models demonstrate proficient capture of spatiotemporal patterns within air pollution data with similar out-of-sample performance. Furthermore, the study delves into station-specific analyses, revealing variable model performance contingent on localised conditions. Model interpretation, facilitated by parametric coefficient analysis and partial dependence plots, unveils consistent associations between predictor variables and $$\text {PM}_{2.5}$$ PM 2.5 concentrations. Despite nuanced variations in modelling spatiotemporal correlations, all models effectively accounted for the underlying dependence. In summary, this study underscores the efficacy of conventional techniques in modelling correlated spatiotemporal data, concurrently highlighting the complementary potential of Machine Learning and classical statistical approaches.
Functional zoning for air quality Ignaccolo, Rosaria; Ghigo, Stefania; Bande, Stefano
Environmental and ecological statistics,
03/2013, Letnik:
20, Številka:
1
Journal Article
Recenzirano
Odprti dostop
Environmental local agencies have to enforce European directives that impose a land classification, according to air quality status, to distinguish zones needing further actions from those needing ...only maintenance. This paper presents a land classification in zones featured by different criticality levels of atmospheric pollution, considering pollutant time series as functional data: we call this proposal “Functional Zoning”. Our proposal is articulated in order to also meet two specific requirements: upscaling pollutant concentration data to the municipality scale, since municipalities are the reference territorial administrative units for undertaking actions; aggregating different pollutants in order to provide a multi-pollutant zoning outcome reflecting the air quality status. Specifically, we present three different alternatives to upscale data from a regular grid to the municipality scale. Then, to aggregate by pollutant, we evaluate two strategies summarizing time series: the assessment of an air quality index and the use of the Multivariate Functional Principal Component Analysis (MFPCA). The partition of municipalities is obtained by clustering air quality time series and MFPCA scores. In particular, the proposed functional zoning is carried out for Piemonte (Northern Italy), considering the hourly concentration fields of the main pollutants. We obtain six classifications of the same land and we propose a comparison study of the different strategies’ results, by mapping and analyzing the differences between clusters’ labels. By taking into account the comparison study’s findings, we finally suggest an analysis strategy to environmental agencies and policy makers to obtain an easily interpretable outcome at a very reasonable computational cost.