Haloketones (HKs) is one class of disinfection by-products (DBPs) which is genetically toxic and mutagenic. Monitoring HKs in drinking water is important for drinking water safety, yet it is a ...time-consuming and laborious job. Developing predictive models of HKs to estimate their occurrence in drinking water is a good alternative, but to date no study was available for HKs modeling. This study was to explore the feasibility of linear, log linear regression models, back propagation (BP) as well as radial basis function (RBF) artificial neural networks (ANNs) for predicting HKs occurrence (including dichloropropanone, trichloropropanone and total HKs) in real water supply systems. Results showed that the overall prediction ability of RBF and BP ANNs was better than linear/log linear models. Though the BP ANN showed excellent prediction performance in internal validation (N25 = 98–100%, R2 = 0.99–1.00), it could not well predict HKs occurrence in external validation (N25 = 62–69%, R2 = 0.202–0.848). Prediction ability of RBF ANN in external validation (N25 = 85%, R2 = 0.692–0.909) was quite good, which was comparable to that in internal validation (N25 = 74–88%, R2 = 0.799–0.870). These results demonstrated RBF ANN could well recognized the complex nonlinear relationship between HKs occurrence and the related water quality, and paved a new way for HKs prediction and monitoring in practice.
Display omitted
•Linear/log linear regression models cannot well predict haloketones (HKs) levels.•Back propagation (BP) is good to predict HKs in internal but bad in external validation.•Radial basis function artificial neural network (RBF ANN) well predicts HKs levels.•RBF ANN can well recognize complex relationships between HKs and water quality.
•More accurate and explanatory models are proposed for building energy benchmarking.•A gradient boosted model shows a 13.7% decrease in error over Energy Star.•The first use of explanatory methods ...(XAI) in energy use benchmarking.•Application to over 15,000 buildings and released the code as open-source.
Building energy performance benchmarking has been adopted widely in the USA and Canada through the Energy Star Portfolio Manager platform. Building operations and energy management professionals have long used this simple 1–100 score to understand how their building compares to its peers. This single number is easy to use but is created by potentially inaccurate multiple linear regression (MLR) models and lacks much further information about why a building achieves that score. This paper proposes a methodology that enhances the existing Energy Star calculation method by increasing accuracy and providing additional model output processing to help explain why a building is achieving a particular score. Two new prediction models were proposed and tested: multiple linear regression with feature interactions (MLRi) and gradient boosted trees (GBT). Both models performed better than a baseline Energy Star MLR model as well as four baseline models from previous benchmarking studies. This paper shows that for six building types, on average, the third-order MLRi models achieved a 4.9% increase in adjusted R2 and a 7.0% decrease in normalized root mean squared error (NRMSE) over the baseline MLR model. More substantially, the most accurate GBT models, on average, achieved a 24.9% increase in adjusted R2 and a 13.7% decrease in NMRSE against the baseline MLR model. In addition, a set of techniques was developed to help determine which factors most influence a building’s energy use versus its peers using SHapley Additive exPlanation (SHAP) values. The SHAP force visualization, in particular, offered an accessible overview of the aspects of the building that influenced the score that even non-technical users can interpret. This methodology was tested on the 2012 Commercial Building Energy Consumption Survey (CBECS)(1,812 buildings) and public data sets from the energy disclosure programs of New York City (11,131 buildings) and Seattle (2,073 buildings).
Despite the considerable reductions in primary and secondary air pollutants in China, surface ozone levels have increased in recent years. We report a trend of 3.3 ± 4.7 μg.m−3 year−1 in the annual ...mean maximum daily average ozone over an 8-h period (MDA8 ozone) across China between 2015 and 2019. Leveraging the Kolmogorov–Zurbenko filter method, we find that meteorology enhanced the ozone levels in Beijing–Tianjin–Hebei (BTH), the Yangtze River Delta (YRD), and the Pearl River Delta (PRD) while the reduction of solar radiation and the planetary boundary layer height accelerated ozone decreases in the Sichuan Basin (SCB) after 2017. Solar radiation and temperature increases, together with the reduction in sea level pressure, were the main contributors to enhance ozone in the YRD. They also contributed to 32% of ozone increases in BTH. Weaker meridional wind, lower relative humidity, and higher temperature escalated ozone enhancement in the PRD between 2016 and 2018. Regarding precursor emissions, NO2 long-term components showed a noticeable decline in all regions after 2017, partially due to the introduction of the most current action plan to reduce air pollutants over China in 2018. In contrast, the satellite-retrieved data suggest that VOC concentrations did not change substantially in target regions during the study period. After 2017, however, VOCs slightly increased in BTH, the YRD, and the PRD, which might be driven by temperature enhancements. Overall, the impact of meteorology on ozone variations was dominant in the YRD, the PRD, and the SCB from 2015 to 2019. Precursor emissions, however, played a leading role in ozone enhancement over the BTH. We also found that BTH and the YRD were in a transitional ozone formation regime while the PRD and the SCB tended to be more NOx-sensitive.
Display omitted
•Variations of surface ozone over most polluted regions of China were investigated.•Meteorology exacerbated ozone pollution in BTH, YRD, and PRD.•Precursor emissions were the main contributors to ozone increases over BTH.•VOC concentrations slightly changed, while NO2 decreased after 2017 in all regions.•BTH and YRD were in a mixed regime, while PRD and SCB experienced NOx-limited ones.
In this study, a multilevel linear regression technique based on neural network tailored association is suggested to predict human mental depression. The suggested technique uses a neural network ...configured for association-based multiple linear regression to forecast the mental depression dataset. The spectrum of depression is predicted using a variety of statistical techniques, including both multiple linear regression and linear regression with neural network tuning. When predicting the severity of depression, tweaked algorithms perform less well. They have been fine-tuned for significant differences in the accuracy, timing, and speed of depression predictions. To address these difficulties, a multiple linear regression solution based on neural network tailored association is suggested. The Multiple linear regression using a neural network that has been tweaked for association yields high compared to other statistical approaches, accuracy prediction is roughly 91%.
The world’s longest trans-basin water diversion project, the Middle-Route (MR) of the South-to-North Water Diversion Project of China (SNWDPC), has officially been in operation for over 5 years since ...December 2014. Its water quality status has always attracted special attention because it is related to the health and safety of more than 58 million people and the integrity of an ecosystem covering more than 155,000 km2. This study presented and analysed the spatio-temporal variations and trends of 16 water quality parameters, including pH, water temperature (WT), dissolved oxygen (DO), permanganate index (PI), five-day biochemical oxygen demand (BOD5), fecal coliform (F. coli), total phosphorus (TP), total nitrogen (TN), ammonia nitrogen (NH3−N), sulphate (SO42−), fluoride (F−), mercury (Hg), arsenic (As), selenium (Se), copper (Cu), and zinc (Zn), which were determined monthly from samples collected at 27 water quality monitoring stations in the MR of the SNWDPC from March 2016 to February 2019. The water quality index (WQI) was used to evaluate the seasonal and spatial water quality changes during the monitoring period, and a new WQImin model consisting of five crucial parameters, i.e., TP, F. coli, Hg, WT, and DO, was built by using stepwise multiple linear regression analysis. The results demonstrated that the water quality status of the MR of the SNWDPC has been steadily maintained at an “excellent” level during the monitoring period, with an overall average WQI value of 90.39 and twelve seasonal mean WQI values ranging from 87.67 to 91.82. The proposed WQImin model that uses the selected five key parameters and the weights of those parameters has exhibited excellent performance in the water quality assessment of the project, with the coefficient of determination (R2), Root Mean Square Error (RMSE), and Percentage Error (PE) values of 0.901, 2.21, 1.93%, respectively, showing that the proposed WQImin model is a useful and efficient tool to evaluate and manage the water quality. For the management department, the risk sources near certain stations with abnormally high values should be carefully inspected and strictly managed to maintain excellent water quality. The potential risks of algae proliferation in this project should be of concern in future research.
Display omitted
•A comprehensive analyses and assessments of water quality were conducted.•The project water quality status has been steadily maintained at “excellent” level.•TP, F. coli, Hg, WT, and DO are the most important parameters of the project.•The potential risks of algae proliferation in the main canal should be of concern.
•Erosion pins were used to measure annual soil erosion rate.•Annual soil erosion rates were modeled using three machine learning techniques.•Boosted regression trees yielded the most favorable ...results for soil erosion modeling.•The slope degree was the most important factor affecting soil erosion.
Assessment of water-induced soil erosion as a crucial part of soil conservation plans is costly and time-consuming when applied to an extensive area. In this study, we propose a methodology based on recording the annual soil erosion in a portion of the study area using erosion pins and assessing the spatial distribution of soil erosion for the entire area using machine learning techniques. First, soil erosion pins were installed, and the amount of soil loss in each pin was recorded. The controlling factors of soil erosion (percentage of vegetation canopy, curvature, slope degree, slope length, percentage of sand, percentage of silt, and percentage of clay) were determined, and the dataset was divided into training (75% of the data) and testing (25% of the data) subsets. Three machine learning algorithms, namely boosted regression trees (BRT), deep learning (DL), and multiple linear regression (MLR), were employed to identify the relationship between soil erosion and its controlling factors. Then, the methods were evaluated by comparison between the predicted and observed values on the testing subset using statistical coefficients including coefficient of determination (R-squared), normalized root mean squared error (NRMSE), and Nash-Sutcliffe efficiency (NSE). Results show that the BRT outperformed the other algorithms in the assessment of the annual soil erosion (R-squared: 0.92, NSE: 0.9, and NRMSE: 0.32). Finally, the optimal algorithm (BRT) was selected to estimate the spatial distribution of soil erosion across the entire study area, and the final erosion map was verified using additional verification pins.
The Statistical Hurricane Intensity Prediction Scheme (SHIPS) is a multiple linear regression model for predicting tropical cyclone (TC) intensity. It has been widely used in operational centers ...because of forecast stability, high accuracy, easy interpretation, and low computational cost. The Japan Meteorological Agency version of SHIPS is called the Typhoon Intensity Forecasting scheme based on SHIPS (TIFS) and predicts both maximum wind speed and central pressure. Although the addition of new predictors to SHIPS and TIFS has improved its accuracy, predicting TC intensity with a single regression model has limitations. In this study, a new TIFS-based forecasting scheme is developed using data from 2000 to 2021, in which three TIFS regression models corresponding to the intensifying, steady-state, and weakening stages of TCs are introduced and in which the weighted mean of the three TIFS forecasts based on random forest (RF) decision trees is computed as a final intensity forecast. Compared to the conventional TIFS model, the new scheme (TIFS-RF) has better accuracy with improvement rates of up to 12 % at forecast times from 1 to 4 days. The improvement is particularly significant for steady-state TCs, tropical depressions, and TCs undergoing extratropical transition within five days. The accuracy of TIFS-RF forecasts is generally better than that of conventional TIFS forecasts for rapidly intensifying TCs, but much worse for rapidly weakening TCs. This study also confirms that a consensus forecast of the TIFS-RF and Hurricane Weather Research and Forecasting (HWRF) models can overcome the weaknesses of each model used alone.
Arsenic (As) is a widespread environmental contaminant that poses a significant threat to ecosystems and human health. Although previous studies have qualitatively revealed the effects of individual ...soil properties on the transport and fate of As in the vadose zone, their integrated impacts remain obscure. Moreover, studies investigating the retardation factor therein, which is a key parameter for comprehending As transport in the vadose zone, are extremely limited. In this study, we investigated the interplay of soil properties with As transport and retention within the vadose zone, while focusing on the retardation factor of As. We employed steady-state unsaturated water-flow soil column experiments coupled with a mobile–immobile model and multiple linear regression analysis to elucidate the dependence of As retardation factors on the soil properties. In the mobile water zone, iron and organic matter contents emerged as the two most influential properties that impedes As mobility. Whereas, in the immobile water zone, the coefficient of uniformity and bulk density were the most influential factors that enhanced As retention. Finally, we derived an empirical equation for calculating the As retardation factors in each zone, offering a valuable tool for describing and predicting As behavior to protect the groundwater resources underneath.
Display omitted
•Dual-porosity characteristics resulted in anomalous transport of As in vadose zone.•Typical soil properties were used to estimate the As retardation in vadose zone.•Iron and organic matter contents were the major factors in the mobile water zone.•Uniformity coefficient and bulk density were the key factors in the immobile water.
This study employs gray system theory to refine the multiple linear regression model, resulting in the development of the gray multiple linear regression method. Utilizing School A as the primary ...subject, this approach involves gray scaling the intercultural communication skills of college instructors (independent variable) and the effectiveness of Chinese culture dissemination (dependent variable). This transformation aids in resolving the sequence of whitened background values for each variable. This transformation aids in sequencing the whitened background values for each variable. The regression coefficients are then calculated using the Cholesky method to determine the linear correlation between the variables and assess how intercultural communication skills influence cultural dissemination outcomes. The analysis of the influence of college teachers’ intercultural communicative competence on the effect of cultural transmission found that the language competence of college foreign language teachers is related to “attitude toward learning Chinese excellent culture” (0.432) and “whether it is necessary to teach the idea of Chinese excellent culture in the classroom” (0.503) both have significant correlations. The analytical results of this paper provide a reference basis for promoting the wide dissemination of Chinese excellent culture, as well as a direction for improving the cross-cultural communicative competence of foreign language teachers in colleges and universities.
•Rice yields predictions were fundamentally improved using machine Learning methods.•Phenological variables substantially affected the rice yields by altering the carbon allocation processes.•The ...relative importance of phenological variables were relatively large compared with climatic variables.•The integrated phenological, climatic variables and geographical information was the best combination for yield predictions.
Rice (Oryza sativa L.) is a staple cereal crop and its demand is substantially increasing with the growth of the global population. Precisely predicting rice yields are of vital importance to ensure the food security in countries like China, where rice accounts for one-fifth of the total agricultural production. Previous studies found that the rice yields had been significantly impacted by climate change. In addition, phenological variables were found to be important factors concerning rice yields due to its fundamental role in carbon allocation between plant organs, but its impacts on rice yields were seldom evaluated. In this study, eleven combinations of phenology, climate and geography data were tested to predict the site-based rice yields using a traditional regression-based method (MLR, multiple linear regression), and more advanced three machine learning (ML) methods: backpropagation neural network (BP), support vector machine (SVM) and random forest (RF). The results showed that ML methods were more precise than MLR method. The combination using the integrated phenology, climate during growing season and geographical information was better for yields predictions than other combinations across the ML methods, e.g. the difference RMSE (R2) between prediction and observed rice yields were 800 (0.24), 737 (0.33), and 744 (0.31) kg/ha for BP, SVM and RF, respectively. The SVM had achieved the highest precisions in yield predictions and the phenological variables substantially improved the accuracy of yield predictions, and the relative importance of phenological variables were even similar as climatic variables. We highlight the phenology and climate need to be accurately represented in the crop models to improve the accuracy in rice yield prediction under climate change conditions using integrated ML methods.