Predictions from process-based models of environmental systems are biased, due to uncertainties in their inputs and parameterizations, reducing their utility. We develop a predictor for the bias in ...tropospheric ozone (O3, a key pollutant) calculated by an atmospheric chemistry transport model (GEOS-Chem), based on outputs from the model and observations of ozone from both the surface (EPA, EMEP, and GAW) and the ozone-sonde networks. We train a gradient-boosted decision tree algorithm (XGBoost) to predict model bias (model divided by observation), with model and observational data for 2010–2015, and then we test the approach using the years 2016–2017. We show that the bias-corrected model performs considerably better than the uncorrected model. The root-mean-square error is reduced from 16.2 to 7.5 ppb, the normalized mean bias is reduced from 0.28 to −0.04, and Pearson's R is increased from 0.48 to 0.84. Comparisons with observations from the NASA ATom flights (which were not included in the training) also show improvements but to a smaller extent, reducing the root-mean-square error (RMSE) from 12.1 to 10.5 ppb, reducing the normalized mean bias (NMB) from 0.08 to 0.06, and increasing Pearson's R from 0.76 to 0.79. We attribute the smaller improvements to the lack of routine observational constraints for much of the remote troposphere. We show that the method is robust to variations in the volume of training data, with approximately a year of data needed to produce useful performance. Data denial experiments (removing observational sites from the algorithm training) show that information from one location (for example Europe) can reduce the model bias over other locations (for example North America) which might provide insights into the processes controlling the model bias. We explore the choice of predictor (bias prediction versus direct prediction) and conclude both may have utility. We conclude that combining machine learning approaches with process-based models may provide a useful tool for improving these models.
Aromatic hydrocarbons, including benzene, toluene, and xylenes, play an important role in atmospheric chemistry, but the associated chemical mechanisms are complex and uncertain. Sparing ...representation of this chemistry in models is needed for computational tractability. Here, we develop a new compact mechanism for aromatic chemistry (GC13) that captures current knowledge from laboratory and computational studies with only 17 unique species and 44 reactions. We compare GC13 to six other currently used mechanisms of varying complexity in box model simulations of environmental chamber data and diurnal boundary layer chemistry, and show that GC13 provides results consistent with or better than more complex mechanisms for oxygenated products (alcohols, carbonyls, dicarbonyls), ozone, and hydrogen oxide (HOx≡OH+HO2) radicals. Specifically, GC13 features increased radical recycling and increased ozone destruction from phenoxy–phenylperoxy radical cycling relative to other mechanisms. We implement GC13 into the GEOS-Chem global chemical transport model and find higher glyoxal yields and net ozone loss from aromatic chemistry compared with other mechanisms. Aromatic oxidation in the model contributes 23 %, 5 %, and 8 % of global glyoxal, methylglyoxal, and formic acid production, respectively, and has mixed effects on formaldehyde. It drives small decreases in global tropospheric OH (−2.2 %), NOx (≡NO+NO2; −3.7 %), and ozone (−0.8 %), but a large increase in NO3 (+22 %) from phenoxy–phenylperoxy radical cycling. Regional effects in polluted environments can be substantially larger, especially from the photolysis of carbonyls produced by aromatic oxidation, which drives large wintertime increases in OH and ozone concentrations.
Effective mitigation of surface ozone pollution entails detailed knowledge of the contributing precursors’ sources. We use the GEOS-Chem adjoint model to analyze the precursors contributing to ...surface ozone in the Beijing–Tianjin–Hebei area (BTH) of China on days of different ozone pollution severities in June 2019. We find that BTH ozone on heavily polluted days is sensitive to local emissions, as well as to precursors emitted from the provinces south of BTH (Shandong, Henan, and Jiangsu, collectively the SHJ area). Heavy ozone pollution in BTH can be mitigated effectively by reducing NO x (from industrial processes and transportation), ≥C3 alkenes (from on-road gasoline vehicles and industrial processes), and xylenes (from paint use) emitted from both BTH and SHJ, as well as by reducing CO (from industrial processes, transportation, and power generation) and ≥C4 alkanes (from industrial processes, paint and solvent use, and on-road gasoline vehicles) emissions from SHJ. In addition, reduction of NO x , xylene, and ≥C3 alkene emissions within BTH would effectively decrease the number of BTH ozone-exceedance days. Our analysis pinpoint the key areas and activities for locally and regionally coordinated emission control efforts to improve surface ozone air quality in BTH.
We present a methodology that uses gradient-boosted regression trees (a
machine learning technique) and a full-chemistry simulation (i.e., training
dataset) from a chemistry–climate model (CCM) to ...efficiently generate a
parameterization of tropospheric hydroxyl radical (OH) that is a function of
chemical, dynamical, and solar irradiance variables. This surrogate model of
OH is designed to be integrated into a CCM and allow for
computationally efficient simulation of nonlinear feedbacks between OH and
tropospheric constituents that have loss by reaction with OH as their
primary sinks (e.g., carbon monoxide (CO), methane (CH4), volatile
organic compounds (VOCs)). Such a model framework is advantageous for
studies that require multi-decadal simulations of CH4 or multi-year
sensitivity simulations to understand the causes of trends and variations of
CO and CH4. To allow the user to easily target the training dataset
towards a desired application, we are outlining a methodology to generate a
parameterization of OH and not presenting an “off-the-shelf” version of a
parameterization to be incorporated into a CCM. This provides for the
relatively easy creation of a new parameterization in response to, for
example, changes in research goals or the underlying CCM chemistry and/or
dynamics schemes. We show that a sample parameterization of OH generated
from a CCM simulation is able to reproduce OH concentrations with a
normalized root-mean-square error of approximately 5 % and
capture the global mean methane lifetime within approximately 1 %. Our
calculated accuracy of the parameterization assumes inputs being within the
bounds of the training dataset. Large excursions from these bounds will
likely decrease the overall accuracy. However, we show that the sample
parameterization predicts large deviations in OH for an El Niño event
that was not part of the training dataset and that the spatial distribution
and strength of these deviations are consistent with the event. This result
gives confidence in the fidelity of a parameterization developed with our
methodology to simulate the spatial and temporal responses of OH to
perturbations from large variations in the chemical, dynamical, and solar
irradiance drivers of OH. In addition, we discuss how two machine learning
metrics, Gain feature importance and Shapley
additive explanations values, indicate that the behavior
of a parameterization of OH generally accords with our understanding of OH
chemistry, even though there are no physics- or chemistry-based constraints
on the parameterization.
In assessments of cancer risk from atmospheric polycyclic aromatic hydrocarbons (PAHs), scientists and regulators rarely consider the complex mixture of emitted compounds and degradation products, ...and they often represent the entire mixture using a single emitted compound—benzoapyrene. Here, we show that benzoapyrene is a poor indicator of PAH risk distribution and management: nearly 90% of cancer risk worldwide results from other PAHs, including unregulated degradation products of emitted PAHs. We develop and apply a global‐scale atmospheric model and conduct health impact analyses to estimate human cancer risk from 16 PAHs and several of their N‐PAH degradation products. We find that benzoapyrene is a minor contributor to the total cancer risks of PAHs (11%); the remaining risk comes from other directly emitted PAHs (72%) and N‐PAHs (17%). We show that assessment and policy‐making that relies solely on benzoapyrene exposure provides misleading estimates of risk distribution, the importance of chemical processes, and the prospects for risk mitigation. We conclude that researchers and decision‐makers should consider additional PAHs as well as degradation products.
Plain Language Summary
Nearly 90% of global human lung cancer risk from polycyclic aromatic hydrocarbons (PAHs) comes from compounds omitted by prior analyses and not regulated directly. PAHs in the atmosphere are a complex mixture, but regulators and researchers often represent them using a single compound, namely benzo(a)pyrene. We show that benzo(a)pyrene is a poor indicator of global PAH cancer risk; its use as a proxy leads to erroneous conclusions about high‐risk populations and atmospheric chemical processes. We find that approximately 17% of risk comes from PAHs that are produced in atmospheric reactions and are not regulated or routinely monitored. Regulators and researchers should focus on the entire mixture of PAHs in the atmosphere, and we recommend that benzo(a)pyrene not be used as a sole reference compound.
Key Points
Benzoapyrene is a small contributor to human cancer risk of polycyclic aromatic hydrocarbons (PAHs) worldwide (11%)
Using benzoapyrene as a surrogate compound leads to erroneous conclusions about high‐risk populations and the importance of uncertain chemical processes
Science and policy could be improved by considering a wider group of both emitted PAHs as well as their degradation products
The impact of emissions of volatile organic compounds (VOCs) to the
atmosphere on the production of secondary pollutants, such as ozone and
secondary organic aerosol (SOA), is mediated by the ...concentration of nitric
oxide (NO). Polluted urban atmospheres are typically considered to be
“high-NO” environments, while remote regions such as rainforests, with
minimal anthropogenic influences, are considered to be “low NO”. However,
our observations from central Beijing show that this simplistic separation
of regimes is flawed. Despite being in one of the largest megacities in the
world, we observe formation of gas- and aerosol-phase oxidation products
usually associated with low-NO “rainforest-like” atmospheric oxidation
pathways during the afternoon, caused by extreme suppression of NO
concentrations at this time. Box model calculations suggest that during
the morning high-NO chemistry predominates (95 %) but in the afternoon
low-NO chemistry plays a greater role (30 %). Current emissions
inventories are applied in the GEOS-Chem model which shows that such models,
when run at the regional scale, fail to accurately predict such an extreme
diurnal cycle in the NO concentration. With increasing global emphasis on
reducing air pollution, it is crucial for the modelling tools used to
develop urban air quality policy to be able to accurately represent such
extreme diurnal variations in NO to accurately predict the formation of
pollutants such as SOA and ozone.
Low-cost sensors (LCSs) are an appealing solution to the problem of spatial
resolution in air quality measurement, but they currently do not have the
same analytical performance as regulatory ...reference methods. Individual
sensors can be susceptible to analytical cross-interferences; have random
signal variability; and experience drift over short, medium and long
timescales. To overcome some of the performance limitations of individual
sensors we use a clustering approach using the instantaneous median signal
from six identical electrochemical sensors to minimize the randomized drifts
and inter-sensor differences. We report here on a low-power analytical device
(< 200 W) that is comprised of clusters of sensors for
NO2, Ox, CO and total volatile organic compounds
(VOCs) and that measures supporting parameters such as water vapour and temperature.
This was tested in the field against reference monitors, collecting ambient
air pollution data in Beijing, China. Comparisons were made of NO2
and Ox clustered sensor data against reference methods for
calibrations derived from factory settings, in-field simple linear regression
(SLR) and then against three machine learning (ML) algorithms. The parametric
supervised ML algorithms, boosted regression trees (BRTs) and boosted linear
regression (BLR), and the non-parametric technique, Gaussian process (GP),
used all available sensor data to improve the measurement estimate of
NO2 and Ox. In all cases ML produced an
observational value that was closer to reference measurements than SLR alone.
In combination, sensor clustering and ML generated sensor data of a quality
that was close to that of regulatory measurements (using the RMSE metric) yet
retained a very substantial cost and power advantage.
Low-cost sensors (LCSs) are an appealing solution to the problem of spatial resolution in air quality measurement, but they currently do not have the same analytical performance as regulatory ...reference methods. Individual sensors can be susceptible to analytical cross-interferences; have random signal variability; and experience drift over short, medium and long timescales. To overcome some of the performance limitations of individual sensors we use a clustering approach using the instantaneous median signal from six identical electrochemical sensors to minimize the randomized drifts and inter-sensor differences. We report here on a low-power analytical device (<200 W) that is comprised of clusters of sensors forNO2, Ox, CO and total volatile organic compounds (VOCs) and that measures supporting parameters such as water vapour and temperature. This was tested in the field against reference monitors, collecting ambient air pollution data in Beijing, China. Comparisons were made of NO2 and Ox clustered sensor data against reference methods for calibrations derived from factory settings, in-field simple linear regression (SLR) and then against three machine learning (ML) algorithms. The parametric supervised ML algorithms, boosted regression trees (BRTs) and boosted linear regression (BLR), and the non-parametric technique, Gaussian process (GP), used all available sensor data to improve the measurement estimate ofNO2 and Ox. In all cases ML produced an observational value that was closer to reference measurements than SLR alone. In combination, sensor clustering and ML generated sensor data of a quality that was close to that of regulatory measurements (using the RMSE metric) yet retained a very substantial cost and power advantage.
Low-cost sensors (LCSs) are an appealing solution to the problem of spatial resolution in air quality measurement, but they currently do not have the same analytical performance as regulatory ...reference methods. Individual sensors can be susceptible to analytical cross-interferences; have random signal variability; and experience drift over short, medium and long timescales. To overcome some of the performance limitations of individual sensors we use a clustering approach using the instantaneous median signal from six identical electrochemical sensors to minimize the randomized drifts and inter-sensor differences. We report here on a low-power analytical device ( 200 W) that is comprised of clusters of sensors for NO.sub.2, O.sub.x, CO and total volatile organic compounds (VOCs) and that measures supporting parameters such as water vapour and temperature. This was tested in the field against reference monitors, collecting ambient air pollution data in Beijing, China. Comparisons were made of NO.sub.2 and O.sub.x clustered sensor data against reference methods for calibrations derived from factory settings, in-field simple linear regression (SLR) and then against three machine learning (ML) algorithms. The parametric supervised ML algorithms, boosted regression trees (BRTs) and boosted linear regression (BLR), and the non-parametric technique, Gaussian process (GP), used all available sensor data to improve the measurement estimate of NO.sub.2 and O.sub.x . In all cases ML produced an observational value that was closer to reference measurements than SLR alone. In combination, sensor clustering and ML generated sensor data of a quality that was close to that of regulatory measurements (using the RMSE metric) yet retained a very substantial cost and power advantage.