The factors influencing residents health have become complex and intertwined with the development of economy and society. Traditional research with a single factor on health will not provide an ...accurate picture of the situation. This paper collects data on economic, environmental and social factors to estimate their impact on regional health. Considering the data is multi-source and complex, this paper proposes a combined feature importance algorithm, which weighted the feature importance of RF, XGB and SOIL. The algorithm does not depend on the data and adaptively approximates the true results. The results show that economic factors have a significant and direct impact on health, environmental factors have a lag correlation with health level, and social factors have a more complicated effect on health. Finally, we provide policy suggestions for health on economic, environmental, and social development.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
•Compressive and flexural strengths of SFRC are successfully predicted by machine learning algorithms.•Tree-based and boosting models are recommended for SFRC predictions.•W/C ratio and silica fume ...are most important parameters of predicting compressive strength.•Fiber volume fraction and silica fume are the most important for predicting flexural strength.•XGBoost and gradient boost regressors are selected as the most appropriate machine learning algorithms of SFRC.
Steel fiber-reinforced concrete (SFRC) has a performance superior to that of normal concrete because of the addition of discontinuous fibers. The development of strengths prediction technique of SFRC is, however, still in its infancy compared to that of normal concrete because of its complexity and limited available data. To overcome this limitation, research was conducted to develop an optimum machine learning algorithm for predicting the compressive and flexural strengths of SFRC. The resulting feature impact was also analyzed to confirm the reliability of the models. To achieve this, compressive and flexural strengths data from SFRC were collected through extensive literature reviews, and a database was created. Eleven machine learning algorithms were then established based on the dataset. K-fold validation was conducted to prevent overfitting, and the algorithms were regulated. The boosting- and tree-based models had the optimal performance, whereas the K-nearest neighbor, linear, ridge, lasso regressor, support vector regressor, and multilayer perceptron models had the worst performance. The water-to-cement ratio and silica fume content were the most influential factors in the prediction of compressive strength of SFRC, whereas the silica fume and fiber volume fraction most strongly influenced the flexural strength. Finally, it was found that, in general, the compressive strength prediction performance was better than the flexural strength prediction performance, regardless of the machine learning algorithm.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
The renewal of green home appliances is a crucial measure for households to save energy and reduce emissions. However, how online reviews, especially those relate to energy-saving, affect green home ...appliance purchase behavior (GHAPB) lacks exploration. In this paper, we investigate over 1 million online reviews on about 3,116 types of air conditioner from JD. By applying word2vec, we divide energy-saving related information in the following three types: norm information, environmental health information and price information, and construct dictionaries for each. Then, the effect value of energy-saving information is quantified from perspectives of breadth, depth and intensity through sentiment analysis. The influence of energy-saving information in online reviews on GHAPB is finally analyzed by linear regression and machine learning models. The results show that all energy-saving information has positive impact on GHAPB, and environmental health information is the most important one. In addition, the attributes of online reviews impose a greater influence on GHAPB, comparing with those of products. The in-depth exploration of energy-saving information in online reviews provides targeted recommendations for the manufacturer and the retailer to promote the adoption of green home appliances.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
The rapid increase in both the quantity and complexity of data that are being generated daily in the field of environmental science and engineering (ESE) demands accompanied advancement in data ...analytics. Advanced data analysis approaches, such as machine learning (ML), have become indispensable tools for revealing hidden patterns or deducing correlations for which conventional analytical methods face limitations or challenges. However, ML concepts and practices have not been widely utilized by researchers in ESE. This feature explores the potential of ML to revolutionize data analysis and modeling in the ESE field, and covers the essential knowledge needed for such applications. First, we use five examples to illustrate how ML addresses complex ESE problems. We then summarize four major types of applications of ML in ESE: making predictions; extracting feature importance; detecting anomalies; and discovering new materials or chemicals. Next, we introduce the essential knowledge required and current shortcomings in ML applications in ESE, with a focus on three important but often overlooked components when applying ML: correct model development, proper model interpretation, and sound applicability analysis. Finally, we discuss challenges and future opportunities in the application of ML tools in ESE to highlight the potential of ML in this field.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM
•Exploration of Sentinel-2 time series for tree species mapping.•Land surface phenology and composite imagery outperform regular multitemporal imagery for mapping tree species.•Our approach provides ...high mapping accuracy in areas of frequent cloud cover.•Feature importance reveals the importance of Sentinel-2 SWIR bands.
Optical satellite imagery with high temporal and spatial resolution, such as acquired by Sentinel-2, is increasingly becoming available and is used to derive maps of tree species. Such mapping products are required in the scope of operational and sustainable forest management. Existing studies that employ Sentinel-2 imagery have already evaluated different classification algorithms but are often confined to areas smaller than a single Sentinel-2 scene. In this study, the area of interest (a large part of the Province of Tyrol (Austria)) is covered by two Sentinel-2 tiles, of which approximately 5000 km² are forested. In order to deal with seasonal metrics under recurrent cloud cover conditions, we exploit one year of Sentinel-2 imagery by using land surface phenology (LSP) and seasonal cloud-free composites for mapping five different tree species groups (Broadleaved-, Larch- (Larix), Pine- (Pinus), Dwarf Pine- (Pinus mugo) and Spruce/Fir (Abies alba/Picea abies) stands). Although a regular multitemporal classification setup based on three cloud-free images reached an overall accuracy of around 84.4 % and outperformed monotemporal setups by around 10 % points, the availability of single cloud-free images was limited in the mountainous region. Thus, alternative approaches, using combined measures for the entire time series of Sentinel-2 imagery, i.e. three-monthly temporal reflectance composites and phenological metrics, were tested and could even improve overall accuracy by 1–2 % points. As a conclusion, we agree with previous studies that multitemporal imagery can help improving the mapping accuracy. However, leveraging satellite image time series for large-scale mapping of tree species should not only rely on high-quality cloud-free single images and should strongly be supported by i.e. seasonal composites or multi-image metrics. Therefore, development and provisioning of such datasets should be fostered.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
A stacked ensemble model is developed for forecasting and analyzing the daily average concentrations of fine particulate matter (PM2.5) in Beijing, China. Special feature extraction procedures, ...including those of simplification, polynomial, transformation and combination, are conducted before modeling to identify potentially significant features based on an exploratory data analysis. Stability feature selection and tree-based feature selection methods are applied to select important variables and evaluate the degrees of feature importance. Single models including LASSO, Adaboost, XGBoost and multi-layer perceptron optimized by the genetic algorithm (GA-MLP) are established in the level 0 space and are then integrated by support vector regression (SVR) in the level 1 space via stacked generalization. A feature importance analysis reveals that nitrogen dioxide (NO2) and carbon monoxide (CO) concentrations measured from the city of Zhangjiakou are taken as the most important elements of pollution factors for forecasting PM2.5 concentrations. Local extreme wind speeds and maximal wind speeds are considered to extend the most effects of meteorological factors to the cross-regional transportation of contaminants. Pollutants found in the cities of Zhangjiakou and Chengde have a stronger impact on air quality in Beijing than other surrounding factors. Our model evaluation shows that the ensemble model generally performs better than a single nonlinear forecasting model when applied to new data with a coefficient of determination (R2) of 0.90 and a root mean squared error (RMSE) of 23.69μg/m3. For single pollutant grade recognition, the proposed model performs better when applied to days characterized by good air quality than when applied to days registering high levels of pollution. The overall classification accuracy level is 73.93%, with most misclassifications made among adjacent categories. The results demonstrate the interpretability and generalizability of the stacked ensemble model.
Display omitted
•Exploratory data analysis and feature extraction are conducted for comprehensive understanding of air quality forecasting.•Stability feature selection and tree based feature selection methods are applied to select important variables.•Stacked ensemble model is established to improve the generalizability and robustness.•Feature importance analysis is given to account for the interpretation of the ensemble model.•The proposed model outperforms other considered single models.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP
•A categorical boosting model is employed to intelligently forecast building energy consumption.•It raises accuracy and mitigates uncertainty in understanding building energy performance.•Feature ...importance can be measured to quantify features’ impacts on energy consumption.•Outlier detection can distinguish normal and abnormal energy usage to make early warnings.•Results will provide references to make data-driven decisions in optimizing energy utilization.
For better energy evaluation and management, a categorical boosting (CatBoost)-based predictive method is presented to accurately estimate building energy consumption by learning large volumes of multi-source heterogeneous data collected from buildings. To be specific, the newly-developed CatBoost model belonging to the ensemble learning has superiority in handling categorical variables and producing reliable results. As a case study, our proposed method is validated in a multi-dimensional dataset about Seattle's building energy performance provided by the city’s government, aiming to estimate the weather normalized site energy use intensity of buildings and characterize its non-linear relationship with other 12 possible influential features. Results from the 5-fold cross-validation demonstrate that the model exhibits a strong ability in predicting the exact value of energy intensity precisely, which can even outperform popular machine learning algorithms including random forest and gradient boosting decision tree under R2 of 0.897. Based on a defined threshold, these predicted values can be classified as the normal or abnormal energy consumption reaching an accuracy of 99.32% for outlier detection, which is helpful in alarming potential risks at an early stage and developing strategies to enhance the energy efficiency. Moreover, results from the established model can be interpreted objectively, suggesting that features concerning the physical and energy characteristics contribute more to energy estimation than environmental features. Since such results understand the building energy consumption and efficiency in a data-driven manner, they can eventually serve as guidance for building owners and designers in designing and renovating buildings to achieve better energy-conserving performance.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Display omitted
•A machine learning model is developed to predict the quantity and quality of hydrochar.•A database covering diverse biomass types and reaction conditions is compiled.•The decision ...tree regression model successfully characterizes hydrochar (R2 > 0.88).•Genetic algorithm optimizes the process by machine learning-developed cost functions.•Biomass ash/carbon content and temperature influence hydrochar production the most.
Hydrothermal carbonization (HTC) is a process that converts biomass into versatile hydrochar without the need for prior drying. The physicochemical properties of hydrochar are influenced by biomass properties and processing parameters, making it challenging to optimize for specific applications through trial-and-error experiments. To save time and money, machine learning can be used to develop a model that characterizes hydrochar produced from different biomass sources under varying reaction processing parameters. Thus, this study aims to develop an inclusive model to characterize hydrochar using a database covering a range of biomass types and reaction processing parameters. The quality and quantity of hydrochar are predicted using two models (decision tree regression and support vector regression). The decision tree regression model outperforms the support vector regression model in terms of forecast accuracy (R2 > 0.88, RMSE < 6.848, and MAE < 4.718). Using an evolutionary algorithm, optimum inputs are identified based on cost functions provided by the selected model to optimize hydrochar for energy production, soil amendment, and pollutant adsorption, resulting in hydrochar yields of 84.31%, 84.91%, and 80.40%, respectively. The feature importance analysis reveals that biomass ash/carbon content and operating temperature are the primary factors affecting hydrochar production in the HTC process.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
When using machine learning techniques in decision-making processes, the interpretability of the models is important. In the present paper, we adopted the Shapley additive explanation (SHAP), which ...is based on fair profit allocation among many stakeholders depending on their contribution, for interpreting a gradient-boosting decision tree model using hospital data.
For better interpretability, we propose two novel techniques as follows: (1) a new metric of feature importance using SHAP and (2) a technique termed feature packing, which packs multiple similar features into one grouped feature to allow an easier understanding of the model without reconstruction of the model. We then compared the explanation results between the SHAP framework and existing methods using cerebral infarction data from our hospital.
The interpretation by SHAP was mostly consistent with that by the existing methods. We showed how the A/G ratio works as an important prognostic factor for cerebral infarction using proposed techniques.
Our techniques are useful for interpreting machine learning models and can uncover the underlying relationships between features and outcome.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
AbstractRC shear walls are commonly used as lateral load-resisting elements in seismic regions, and the estimation of their shear strengths can become simultaneously design-critical and complex when ...they have so-called squat geometries, i.e., height-to-length ratios less than two. This paper presents a study on the training and interpretation of an advanced machine-learning model that strategically combines two algorithms for the said purpose. To train the model, a comprehensive shear strength database of 434 samples of squat RC walls is utilized. First, the eXtreme Gradient Boosting (XGBoost) algorithm is used to establish a predictive model for estimating the shear strength, wherein 70% and 30% of the data are respectively used for training and validation. This effort resulted in an approximately 97% validation accuracy, which well exceeds current mechanics-based/semiempirical models. Second, the SHapley Additive exPlanations (SHAP) algorithm is used to estimate the relative importance of the factors affecting XGBoost’s shear strength estimates. This step thus enabled physical and quantitative interpretations of the input-output dependencies, which are nominally hidden in conventional machine-learning approaches. Through this setup, several squat wall attributes are identified as being critical in shear strength estimates.