Various approaches have been proposed to model PM2.5 in the recent decade, with satellite-derived aerosol optical depth, land-use variables, chemical transport model predictions, and several ...meteorological variables as major predictor variables. Our study used an ensemble model that integrated multiple machine learning algorithms and predictor variables to estimate daily PM2.5 at a resolution of 1 km × 1 km across the contiguous United States. We used a generalized additive model that accounted for geographic difference to combine PM2.5 estimates from neural network, random forest, and gradient boosting. The three machine learning algorithms were based on multiple predictor variables, including satellite data, meteorological variables, land-use variables, elevation, chemical transport model predictions, several reanalysis datasets, and others. The model training results from 2000 to 2015 indicated good model performance with a 10-fold cross-validated R2 of 0.86 for daily PM2.5 predictions. For annual PM2.5 estimates, the cross-validated R2 was 0.89. Our model demonstrated good performance up to 60 μg/m3. Using trained PM2.5 model and predictor variables, we predicted daily PM2.5 from 2000 to 2015 at every 1 km × 1 km grid cell in the contiguous United States. We also used localized land-use variables within 1 km × 1 km grids to downscale PM2.5 predictions to 100 m × 100 m grid cells. To characterize uncertainty, we used meteorological variables, land-use variables, and elevation to model the monthly standard deviation of the difference between daily monitored and predicted PM2.5 for every 1 km × 1 km grid cell. This PM2.5 prediction dataset, including the downscaled and uncertainty predictions, allows epidemiologists to accurately estimate the adverse health effect of PM2.5. Compared with model performance of individual base learners, an ensemble model would achieve a better overall estimation. It is worth exploring other ensemble model formats to synthesize estimations from different models or from different groups to improve overall performance.
Display omitted
•An ensemble model integrates three machine learning algorithms and estimates PM2.5.•Satellite measurements, land-use terms, and many variables were predictors.•Model predicts daily PM2.5 at 1 km × 1 km grid cells in the entire United States.•Model predictions were downscaled to 100 m × 100 m level.•Monthly uncertainty level of prediction was also estimated.
Particulate matter (PM) air pollution is one of the major causes of death worldwide, with demonstrated adverse effects from both short-term and long-term exposure. Most of the epidemiological studies ...have been conducted in cities because of the lack of reliable spatiotemporal estimates of particles exposure in nonurban settings. The objective of this study is to estimate daily PM10 (PM < 10 μm), fine (PM < 2.5 μm, PM2.5) and coarse particles (PM between 2.5 and 10 μm, PM2.5–10) at 1-km2 grid for 2013–2015 using a machine learning approach, the Random Forest (RF). Separate RF models were defined to: predict PM2.5 and PM2.5–10 concentrations in monitors where only PM10 data were available (stage 1); impute missing satellite Aerosol Optical Depth (AOD) data using estimates from atmospheric ensemble models (stage 2); establish a relationship between measured PM and satellite, land use and meteorological parameters (stage 3); predict stage 3 model over each 1-km2 grid cell of Italy (stage 4); and improve stage 3 predictions by using small-scale predictors computed at the monitor locations or within a small buffer (stage 5). Our models were able to capture most of PM variability, with mean cross-validation (CV) R2 of 0.75 and 0.80 (stage 3) and 0.84 and 0.86 (stage 5) for PM10 and PM2.5, respectively. Model fitting was less optimal for PM2.5–10, in summer months and in southern Italy. Finally, predictions were equally good in capturing annual and daily PM variability, therefore they can be used as reliable exposure estimates for investigating long-term and short-term health effects.
•Estimates of fine and coarse particles at fine spatiotemporal scale are lacking in Italy•We applied a multistage random forest model combining PM data with satellite, land-use and meteorology•We imputed missing satellite AOD data using ensemble atmospheric models•We estimated daily PM10, PM2.5 and PM2.5-10 at a 1-km2 grid over Italy for the years 2013-2015•Our model displayed good CV fitting (R2=0.75 for PM10, R2=0.80 for PM2.5, R2=0.64 for PM2.5-10) and negligible bias
Satellite-derived aerosol optical depth (AOD) measurements have the potential to provide spatiotemporally resolved predictions of both long and short-term exposures, but previous studies have ...generally shown moderate predictive power and lacked detailed high spatio- temporal resolution predictions across large domains. We aimed at extending our previous work by validating our model in another region with different geographical and metrological characteristics, and incorporating fine scale land use regression and nonrandom missingness to better predict PM2.5 concentrations for days with or without satellite AOD measures. We start by calibrating AOD data for 2000–2008 across the Mid-Atlantic. We used mixed models regressing PM2.5 measurements against day-specific random intercepts, and fixed and random AOD and temperature slopes. We used inverse probability weighting to account for nonrandom missingness of AOD, nested regions within days to capture spatial variation in the daily calibration, and introduced a penalization method that reduces the dimensionality of the large number of spatial and temporal predictors without selecting different predictors in different locations. We then take advantage of the association between grid-cell specific AOD values and PM2.5 monitoring data, together with associations between AOD values in neighboring grid cells to develop grid cell predictions when AOD is missing. Finally to get local predictions (at the resolution of 50 m), we regressed the residuals from the predictions for each monitor from these previous steps against the local land use variables specific for each monitor. “Out-of-sample” 10-fold cross-validation was used to quantify the accuracy of our predictions at each step. For all days without AOD values, model performance was excellent (mean “out-of-sample” R 2 = 0.81, year-to-year variation 0.79–0.84). Upon removal of outliers in the PM2.5 monitoring data, the results of the cross validation procedure was even better (overall mean ”out of sample” R 2 of 0.85). Further, cross validation results revealed no bias in the predicted concentrations (Slope of observed vs predicted = 0.97–1.01). Our model allows one to reliably assess short-term and long-term human exposures in order to investigate both the acute and effects of ambient particles, respectively.
Land use regression (LUR) models provide good estimates of spatially resolved long-term exposures, but are poor at capturing short term exposures. Satellite-derived Aerosol Optical Depth (AOD) ...measurements have the potential to provide spatio-temporally resolved predictions of both long and short term exposures, but previous studies have generally showed relatively low predictive power. Our objective was to extend our previous work on day-specific calibrations of AOD data using ground PM₂.₅ measurements by incorporating commonly used LUR variables and meteorological variables, thus benefiting from both the spatial resolution from the LUR models and the spatio-temporal resolution from the satellite models. Later we use spatial smoothing to predict PM₂.₅ concentrations for day/locations with missing AOD measures. We used mixed models with random slopes for day to calibrate AOD data for 2000–2008 across New-England with monitored PM₂.₅ measurements. We then used a generalized additive mixed model with spatial smoothing to estimate PM₂.₅ in location–day pairs with missing AOD, using regional measured PM₂.₅, AOD values in neighboring cells, and land use. Finally, local (100 m) land use terms were used to model the difference between grid cell prediction and monitored value to capture very local traffic particles. Out-of-sample ten-fold cross-validation was used to quantify the accuracy of our predictions. For days with available AOD data we found high out-of-sample R² (mean out-of-sample R² = 0.830, year to year variation 0.725–0.904). For days without AOD values, our model performance was also excellent (mean out-of-sample R² = 0.810, year to year variation 0.692–0.887). Importantly, these R² are for daily, rather than monthly or yearly, values. Our model allows one to assess short term and long-term human exposures in order to investigate both the acute and chronic effects of ambient particles, respectively.
The shape of the non-linear relationship between temperature and mortality varies among cities with different climatic conditions. There has been little examination of how these curves change over ...space and time. We evaluated the short-term effects of hot and cold temperatures on daily mortality over six 7-year periods in 211 US cities, comprising over 42 million deaths. Cluster analysis was used to group the cities according to similar temperatures and relative humidity. Temperature–mortality functions were calculated using B-splines to model the heat effect (lag 0) and the cold effect on mortality (moving average lags 1–5). The functions were then combined through meta-smoothing and subsequently analyzed by meta-regression. We identified eight clusters. At lag 0, Cluster 5 (West Coast) had a RR of 1.14 (95% CI: 1.11,1.17) for temperatures of 27°C vs 15.6°C, and Cluster 6 (Gulf Coast) has a RR of 1.04 (95% CI: 1.03,1.05), suggesting that people are acclimated to their respective climates. Controlling for cluster effect in the multivariate-meta regression we found that across the US, the excess mortality from a 24-h temperature of 27°C decreased over time from 10.6% to 0.9%. We found that the overall risk due to the heat effect is significantly affected by summer temperature mean and air condition usage, which could be a potential predictor in building climate-change scenarios.
•We studied 42 million deaths in 211 U.S. cities from the 1960’s to recent years.•We showed that the effect of hot days is diminished by increasing summer mean temperature within city.•Similarly, the effect of cold days is increased by increasing winter mean temperature.•A modest protective effect of air conditioning was found controlling for the above.•Risk assessments of future temperature changes need to take these adaptive responses into account.
Spatiotemporally resolved particulate matter (PM) estimates are essential for reconstructing long and short-term exposures in epidemiological research. Improved estimates of PM2.5 and PM10 ...concentrations were produced over Italy for 2013–2015 using satellite remote-sensing data and an ensemble modeling approach. The following modeling stages were used: (1) missing values of the satellite-based aerosol optical depth (AOD) product were imputed using a spatiotemporal land-use random-forest (RF) model incorporating AOD data from atmospheric ensemble models; (2) daily PM estimations were produced using four modeling approaches: linear mixed effects, RF, extreme gradient boosting, and a chemical transport model, the flexible air quality regional model. The filled-in MAIAC AOD together with additional spatial and temporal predictors were used as inputs in the three first models; (3) a geographically weighted generalized additive model (GAM) ensemble model was used to fuse the estimations from the four models by allowing the weights of each model to vary over space and time. The GAM ensemble model outperformed the four separate models, decreasing the cross-validated root mean squared error by 1–42%, depending on the model. The spatiotemporally resolved PM estimations produced by the suggested model can be applied in future epidemiological studies across Italy.
Although meteorological stations provide accurate air temperature observations, their spatial coverage is limited and thus often insufficient for epidemiological studies. Satellite data expand ...spatial coverage, enhancing our ability to estimate near surface air temperature (Ta). However, the derivation of Ta from surface temperature (Ts) measured by satellites is far from being straightforward. In this study, we present a novel approach that incorporates land use regression, meteorological variables and spatial smoothing to first calibrate between Ts and Ta on a daily basis and then predict Ta for days when satellite Ts data were not available. We applied mixed regression models with daily random slopes to calibrate Moderate Resolution Imaging Spectroradiometer (MODIS) Ts data with monitored Ta measurements for 2003. Then, we used a generalized additive mixed model with spatial smoothing to estimate Ta in days with missing Ts. Out-of-sample tenfold cross-validation was used to quantify the accuracy of our predictions. Our model performance was excellent for both days with available Ts and days without Ts observations (mean out-of-sample R2=0.946 and R2=0.941 respectively). Furthermore, based on the high quality predictions we investigated the spatial patterns of Ta within the study domain as they relate to urban vs. non-urban land uses.
► We assess minimum air temperature from satellite surface temperature. ► We use a daily calibration approach and general additive models. ► Air temperature was also estimated for days with missing satellite data. ► Our model performance for days with satellite data was excellent (R2=0.946). ► For days without satellite data the model also performed well (R2=0.941).
Particulate matter < 2.5 μm in diameter (PM2.5) and heat are strong predictors of morbidity, yet few studies have examined the effects of long-term exposures on non-fatal events, or assessed the ...short and long-term effect on health simultaneously.
We jointly investigated the association of short and long-term exposures to PM2.5 and temperature with hospital admissions, and explored the modification of the associations with the short-term exposures by one another and by temperature variability.
Daily ZIP code counts of respiratory, cardiac and stroke admissions of adults ≥65 (N = 2,015,660) were constructed across New-England (2001−2011). Daily PM2.5 and temperature exposure estimates were obtained from satellite-based spatio-temporally resolved models. For each admission cause, a Poisson regression was fit on short and long-term exposures, with a random intercept for ZIP code. Modifications of the short-term effects were tested by adding interaction terms with temperature, PM2.5 and temperature variability.
Associations between short and long-term exposures were observed for all of the outcomes, with stronger effects of long-term exposures to PM2.5. For respiratory admissions, the short-term PM2.5 effect (percent increase per IQR) was larger on warmer days (1.12% versus −0.53%) and in months of higher temperature variability (1.63% versus −0.45%). The short-term temperature effect was higher in months of higher temperature variability as well. For cardiac admissions, the PM2.5 effect was larger on colder days (0.56% versus −0.30%) and in months of higher temperature variability (0.99% versus −0.56%).
We observed synergistic effects of short-term exposures to PM2.5, temperature and temperature variability. Long-term exposures to PM2.5 were associated with larger effects compared to short-term exposures.
Display omitted
•Associations between short and long-term exposures were observed for all outcomes.•Long-term exposures to particulate matter < 2.5 μm (PM2.5) had stronger effects than short-term exposures.•Short-term PM2.5 related respiratory risk was larger on warmer days.•Short-term PM2.5 related cardiac in risk was larger on colder days.•Short-term PM2.5 risks were larger in months of higher temperature variability.
The influence of particulate air pollution on respiratory health starts in utero. Fetal lung growth and structural development occurs in stages; thus, effects on postnatal respiratory disorders may ...differ based on timing of exposure.
We implemented an innovative method to identify sensitive windows for effects of prenatal exposure to particulate matter with a diameter less than or equal to 2.5 μm (PM2.5) on children's asthma development in an urban pregnancy cohort.
Analyses included 736 full-term (≥37 wk) children. Each mother's daily PM2.5 exposure was estimated over gestation using a validated satellite-based spatiotemporal resolved model. Using distributed lag models, we examined associations between weekly averaged PM2.5 levels over pregnancy and physician-diagnosed asthma in children by age 6 years. Effect modification by sex was also examined.
Most mothers were ethnic minorities (54% Hispanic, 30% black), had 12 or fewer years of education (66%), and did not smoke in pregnancy (80%). In the sample as a whole, distributed lag models adjusting for child age, sex, and maternal factors (education, race and ethnicity, smoking, stress, atopy, prepregnancy obesity) showed that increased PM2.5 exposure levels at 16-25 weeks gestation were significantly associated with early childhood asthma development. An interaction between PM2.5 and sex was significant (P = 0.01) with sex-stratified analyses showing that the association exists only for boys.
Higher prenatal PM2.5 exposure at midgestation was associated with asthma development by age 6 years in boys. Methods to better characterize vulnerable windows may provide insight into underlying mechanisms.