This study presents a comparative analysis of three predictive models with an increasing degree of flexibility: hidden dynamic geostatistical models (HDGM), generalised additive mixed models (GAMM), ...and the random forest spatiotemporal kriging models (RFSTK). These models are evaluated for their effectiveness in predicting
PM
2.5
concentrations in Lombardy (North Italy) from 2016 to 2020. Despite differing methodologies, all models demonstrate proficient capture of spatiotemporal patterns within air pollution data with similar out-of-sample performance. Furthermore, the study delves into station-specific analyses, revealing variable model performance contingent on localised conditions. Model interpretation, facilitated by parametric coefficient analysis and partial dependence plots, unveils consistent associations between predictor variables and
PM
2.5
concentrations. Despite nuanced variations in modelling spatiotemporal correlations, all models effectively accounted for the underlying dependence. In summary, this study underscores the efficacy of conventional techniques in modelling correlated spatiotemporal data, concurrently highlighting the complementary potential of Machine Learning and classical statistical approaches.
The air in the Lombardy region, Italy, is one of the most polluted in Europe because of limited air circulation and high emission levels. There is a large scientific consensus that the agricultural ...sector has a significant impact on air quality. To support studies quantifying the role of the agricultural and livestock sectors on the Lombardy air quality, this paper presents a harmonised dataset containing daily values of air quality, weather, emissions, livestock, and land and soil use in the years 2016-2021, for the Lombardy region. The daily scale is obtained by averaging hourly data and interpolating other variables. In fact, the pollutant data come from the European Environmental Agency and the Lombardy Regional Environment Protection Agency, weather and emissions data from the European Copernicus programme, livestock data from the Italian zootechnical registry, and land and soil use data from the CORINE Land Cover project. The resulting dataset is designed to be used as is by those using air quality data for research.
Abstract This study presents a comparative analysis of three predictive models with an increasing degree of flexibility: hidden dynamic geostatistical models (HDGM), generalised additive mixed models ...(GAMM), and the random forest spatiotemporal kriging models (RFSTK). These models are evaluated for their effectiveness in predicting $$\text {PM}_{2.5}$$ PM 2.5 concentrations in Lombardy (North Italy) from 2016 to 2020. Despite differing methodologies, all models demonstrate proficient capture of spatiotemporal patterns within air pollution data with similar out-of-sample performance. Furthermore, the study delves into station-specific analyses, revealing variable model performance contingent on localised conditions. Model interpretation, facilitated by parametric coefficient analysis and partial dependence plots, unveils consistent associations between predictor variables and $$\text {PM}_{2.5}$$ PM 2.5 concentrations. Despite nuanced variations in modelling spatiotemporal correlations, all models effectively accounted for the underlying dependence. In summary, this study underscores the efficacy of conventional techniques in modelling correlated spatiotemporal data, concurrently highlighting the complementary potential of Machine Learning and classical statistical approaches.
The air in the Lombardy Plain, Italy, is one of the most polluted in Europe due to limited atmosphere circulation and high emission levels. There is broad scientific consensus that ammonia (NH\(_3\)) ...emissions have a primary impact on air quality, and, in Lombardy, the agricultural sector and livestock activities are widely recognised as being responsible for approximately 97% of regional ammonia emissions due to the high density of livestock. In this paper, we quantify the relationship between ammonia emissions and PM2.5 concentrations in the Lombardy Plain and evaluate PM2.5 changes due to the reduction of ammonia emissions through a "what-if" scenario analysis. The information in the data is exploited using a spatiotemporal statistical model capable of handling spatial and temporal correlation, as well as missing data. To do this, we propose a new heteroskedastic extension of the well-established Hidden Dynamic Geostatistical Model. Maximum likelihood parameter estimates are obtained by the expectation-maximisation algorithm and implemented in a new version of the D-STEM software. Considering the years between 2016 and 2020, the scenario analysis is carried out on high-resolution PM2.5 maps of the Lombardy Plain. As a result, it is shown that a 26% reduction in NH3 emissions in the wintertime could reduce the PM2.5 average by 1.44 mg/m^3 while a 50% reduction could reduce the PM2.5 average by 2.76 mg / m^3 which corresponds to a reduction close to 3.6% and 7% respectively. Finally, results are detailed by province and land type.
This study presents a comparative analysis of three predictive models with an increasing degree of flexibility: hidden dynamic geostatistical models (HDGM), generalised additive mixed models (GAMM), ...and the random forest spatiotemporal kriging models (RFSTK). These models are evaluated for their effectiveness in predicting PM\(_{2.5}\) concentrations in Lombardy (North Italy) from 2016 to 2020. Despite differing methodologies, all models demonstrate proficient capture of spatiotemporal patterns within air pollution data with similar out-of-sample performance. Furthermore, the study delves into station-specific analyses, revealing variable model performance contingent on localised conditions. Model interpretation, facilitated by parametric coefficient analysis and partial dependence plots, unveils consistent associations between predictor variables and PM\(_{2.5}\) concentrations. Despite nuanced variations in modelling spatiotemporal correlations, all models effectively accounted for the underlying dependence. In summary, this study underscores the efficacy of conventional techniques in modelling correlated spatiotemporal data, concurrently highlighting the complementary potential of Machine Learning and classical statistical approaches.
The air in the Lombardy region, Italy, is one of the most polluted in Europe because of limited air circulation and high emission levels. There is a large scientific consensus that the agricultural ...sector has a significant impact on air quality. To support studies quantifying the role of the agricultural and livestock sectors on the Lombardy air quality, this paper presents a harmonised dataset containing daily values of air quality, weather, emissions, livestock, and land and soil use in the years 2016 - 2021, for the Lombardy region. The pollutant data come from the European Environmental Agency and the Lombardy Regional Environment Protection Agency, weather and emissions data from the European Copernicus programme, livestock data from the Italian zootechnical registry, and land and soil use data from the CORINE Land Cover project. The resulting dataset is designed to be used as is by those using air quality data for research.
Lombardy is one of the most polluted regions at the European level, also due to its particular geographical structure and weather conditions which prevent the pollutants’ dispersion, and the high ...levels of emissions coming from human activities. Recently, some evidence has been found regarding the relationship between agriculture and air quality, particularly between ammonia - produced mainly by the livestock sector - and particulate matter concentrations. In this respect, Lombardy is the first Italian region for agriculture production, having 69% of its area classified as agricultural land and about 245 swine and 92 bovines per rural km2. In the Agriculture Impact On Italian Air project (AgrImOnIA, https://agrimonia.net, funded by Fondazione Cariplo within the framework of Data Science for science and society), we aim to predict continuously in space (i.e. mapping) air pollutants concentrations in Lombardy region, taking into account meteorology, land use and emissions coming from agriculture. In this regard, data integration and harmonization process have been carried out starting from data from different sources and characterized by different spatial and temporal resolutions. The first results are based on spatio-temporal Kriging models, with external drift, and an extension of the traditional random forest algorithm to consider the spatial and temporal correlation. These models will be used to generate scenario analysis which simulates the impact of policy interventions in the agricultural sector to mitigate its environmental impact on air quality.