Accurately predicting reference evapotranspiration (ET0) with limited climatic data is crucial for irrigation scheduling design and agricultural water management. This study evaluated eight machine ...learning models in four categories, i.e. neuron-based (MLP, GRNN and ANFIS), kernel-based (SVM, KNEA), tree-based (M5Tree, XGBoost) and curve-based (MARS) models, for predicting daily ET0 with maximum/maximum temperature and precipitation data during 2001-2015 from 14 stations in various climatic regions of China, i.e., arid desert of northwest China (NWC), semi-arid steppe of Inner Mongolia (IM), Qinghai-Tibetan Plateau (QTP), (semi-)humid cold-temperate northeast China (NEC), semi-humid warm-temperate north China (NC), humid subtropical central China (CC) and humid tropical south China (SC). The results showed machine learning models using only temperature data obtained satisfactory daily ET0 estimates (on average R2 = 0.829, RMSE = 0.718 mm day-1, NRMSE = 0.250 and MAE = 0.508 mm day-1). The prediction accuracy was improved by 7.6% across China when information of precipitation was further considered, particularly in (sub)tropical humid regions (by 9.7% in CC and 12.4% in SC). The kernel-based SVM, KNEA and curve-based MARS models generally outperformed the others in terms of prediction accuracy, with the best performance by KNEA in NWC and IM, by SVM in QTP, CC and SC, and very similar performance by them in NEC and NC. SVM (1.9%), MLP (2.0%), MARS (2.6%) and KNEA (6.4%) showed relatively small average increases in RMSE during testing compared with training RMSE. SVM is highly recommended for predicting daily ET0 across China in light of best accuracy and stability, while KNEA and MARS are also promising powerful models.
Accurate estimation of global solar radiation (Rs) is essential to the design and assessment of solar energy utilization systems. Existing empirical and machine learning models for estimating Rs from ...sunshine duration were comprehensively reviewed. The performances of 12 empirical model forms and 12 machine learning algorithms for estimating daily Rs were further evaluated in different climatic zones of China as a case study, i.e. the temperate continental zone (TCZ), temperate monsoon zone (TMZ), mountain plateau zone (MPZ) and (sub)tropical monsoon zone (SMZ). The best-performing model at each station and the overall best model for each climatic zone were selected based on six statistical indictors, a global performance index (GPI) and computational costs (computational time and memory usage). The results revealed that the machine learning models (RMSE: 2.055–2.751 MJ m−2 d−1; NRMSE: 12.8–21.3%; R2: 0.839–0.936) generally outperformed the empirical models (RMSE: 2.118–3.540 MJ m−2 d−1; NRMSE: 12.1–27.5%; R2: 0.834–0.935) in terms of prediction accuracy. The cubic model (M3), modified linear-logarithmic model (M5) and power model (M10) attained generally better ranks among empirical models based on GPI. M3 was the top-ranked model in TMZ and MPZ, while general best performance was obtained by M5 and M2 in SMZ and TCZ, respectively. ANFIS, ELM, LSSVM and MARS obtained generally better performance among machine learning models, with the overall best ranking by ANFIS in TCZ and SMZ and by ELM in MPZ and SMZ. XGBoost (8.1 s and 74.2 MB), M5Tree (11.3 s and 29.7 MB), GRNN (12.3 s and 295.3 MB), MARS (14.4 s and 42.6 MB), MLP (22.4 s and 41.3 MB) and ANFIS (29.8 s and 23.1 MB) showed relatively small computational time and memory usage. Comprehensively considering both the prediction accuracy and computational costs, ANFIS is highly recommended, while MARS and XGBoost are also promising models for daily Rs estimation.
•Sunshine-based empirical and machine learning models for predicting Rs were comprehensively reviewed.•Performances of 12 empirical and 12 machine learning models were evaluated across China.•Accuracy and ranking of models were evaluated using six statistical indictors and Global Performance Index.•Computational costs of 12 types of sunshine-based machine learning models were compared.•The best-performing model at each station and appropriate model in each climatic zone were recommended.
•Temperatures increased, but relative humidity, wind speed and sunshine hour decreased.•More generally decreasing trends than increasing trends in annual ET0 were found.•Abrupt changes were detected ...in 1990s in the MPZ, while in 1980s in the other zones.•Relative humidity was the most sensitive climatic variable except for the MPZ.•Increasing ET0 after 1985 increases crop water demand and aggravates water shortage.
Global climate change has been an increasing challenge to agricultural ecosystems, which will significantly affect the reference crop evapotranspiration (ET0) and subsequently crop water requirements. In this study, the temporal trends and magnitudes of key climatic variables and the accompanying effects on ET0 during 1956–2015 were evaluated at 200 meteorological stations across the temperate continental zone (TCZ), temperate monsoon zone (TMZ), mountain plateau zone (MPZ), and subtropical monsoon zone (SMZ) of China. Results show that maximum and minimum temperatures have increased significantly over the past 60 years, whilst relative humidity, wind speed and sunshine hour exhibited significant decreasing trends across all climatic zones. The overall decreasing trends in annual ET0 were more pronounced than the increasing trends, whereas more increasing trends were found in spring and winter. Abrupt changes for climatic variables and ET0 series were detected in 1990s in the MPZ, while in 1980s in the other climatic zones mainly due to the aggregated emission of greenhouse gases and air pollution from energy consumption in recent decades. Relative humidity was the most sensitive climatic variable in all climatic zones except for the MPZ where ET0 was most sensitive to sunshine hour. However, ET0 had different responses to changing climatic variables in different regions and climatic conditions. The negative contribution of wind speed to the decrease in ET0 was greater than the other climatic variables in the TCZ and the TMZ, whilst the significant increase in minimum temperature and the decrease in sunshine hour contributed most to increasing ET0 in the MPZ and to decreasing ET0 in the SMZ, respectively. Although ET0 displayed a generally decreasing trend during 1956–2015, there was a significantly increasing trend from 1985 to 2015 across China except for the SMZ, especially in the arid and semi-arid zones of China during dry seasons (spring and winter). This may lead to the increase in crop water requirements and aggravate the water shortage in these areas in view of the increase in ET0 in response to ongoing climate change.
•CatBoost, Random Forest (RF) and Support Vector Machine (SVM) are proposed for reference evapotranspiration estimation.•SVM model offered the best prediction accuracy when several input data are not ...available.•CatBoost model has extremely less computational cost than other models.•Generalized models are developed and evaluated with data of twelve stations.•CatBoost perform better than RF and SVM models when the complete input data are available.
Accurate estimation of reference evapotranspiration (ET0) is critical for water resource management and irrigation scheduling. This study evaluated the potential of a new machine learning algorithm using gradient boosting on decision trees with categorical features support (i.e., CatBoost) for accurately estimating daily ET0 with limited meteorological data in humid regions of China. Two other commonly used machine learning algorithms, Random Forests (RF) and Support Vector Machine (SVM), were also assessed for comparison. Eight input combinations of daily meteorological data including both complete and incomplete combinations of solar radiation (Rs), maximum and minimum temperatures (Tmax and Tmin), relative humidity (Hr) and wind speed (U) from five weather stations during 2001–2015 in South China were applied for model training and testing. The results showed that all the three algorithms could achieve satisfactory accuracy for ET0 estimation in subtropical China using Rs, Tmax and Tmin, or U, Hr, Tmax and Tmin as inputs, under the circumstances of lacking complete meteorological parameters. The increases in testing RMSE and MAPE over training RMSE and MAPE showed positive correlations with the number of input parameters to the machine learning models. For the local models, among the three algorithms, SVM offered the best prediction accuracy and stability with incomplete combinations of meteorological parameters as inputs, while CatBoost performed best with the complete combination of parameters. Patterns of the generalized models were almost the same as the local models, but the former ones showed less than 10% decreases in RMSE or MAPE in comparison with the latter ones. In addition, the computing time and memory usage for data processing of CatBoost were much less than those of RF and SVM. Overall, as a tree-based algorithm, CatBoost made significant improvements in accuracy, stability and computational cost when compared to RF. Therefore, the CatBoost algorithm has a very high potential for ET0 estimation in humid regions of China, and even possibly in other parts of the world with similar humid climates.
•SVM and XGBoost models are developed for modeling global solar radiation.•Proposed machine learning models are compared with four empirical models.•XGBoost and SVM algorithms show comparable ...prediction accuracy.•XGBoost models are more stable and efficient than SVM algorithms.•XGBoost models are highly recommended to predict global solar radiation.
The knowledge of global solar radiation (H) is a prerequisite for the use of renewable solar energy, but H measurements are always not available due to high costs and technical complexities. The present study proposes two machine learning algorithms, i.e. Support Vector Machine (SVM) and a novel simple tree-based ensemble method named Extreme Gradient Boosting (XGBoost), for accurate prediction of daily H using limited meteorological data. Daily H, maximum and minimum air temperatures (Tmax and Tmin), transformed precipitation (Pt, 1 for rainfall > 0 and 0 for rainfall = 0) and extra-terrestrial solar radiation (H0) during 1966–2000 and 2001–2015 from three radiation stations in humid subtropical China were used to train and test the models, respectively. Two combinations of input parameters, i.e. (i) only Tmax, Tmin and Ra, and (ii) complete data were considered for simulations. The proposed machine learning models were also compared with four well-known empirical models to evaluate their performances. The results suggest that the SVM and XGBoost models outperformed the selected empirical models. The performance of the machine learning models was improved by 5.9–12.2% for training phase and by 8.0–11.5% for testing phase in terms of RMSE when information of precipitation was further included. Compared with the SVM model, the XGBoost model generally showed better performance for training phase, and slightly weaker but comparable performance for testing phase in terms of accuracy. However, the XGBoost model was more stable with average increase of 6.3% in RMSE, compared to 10.5% for the SVM algorithm. Also, the XGBoost model (3.02 s and 0.05 s for training and testing phase, respectively) showed much higher computation speed than the SVM model (27.48 s and 4.13 s for training and testing phase, respectively). By jointly considering the prediction accuracy, model stability and computational efficiency, the XGBoost model is highly recommended to estimate daily H using commonly available temperature and precipitation data with excellent performance in humid subtropical climates.
•Potential of tree-based ensemble models for daily ET0 estimation with limited climatic data is explored.•Proposed ensemble models are compared with their corresponding SVM and ELM models.•ELM and ...SVM models offered the best combination of prediction accuracy and stability.•XGBoost and GBDT models have comparable accuracy and stability to those of ELM and SVM models.•XGBoost and GBDT models have less computational costs than the other models.
Accurate estimation of reference evapotranspiration (ET0) is of great importance for the regional water resources planning and irrigation scheduling design. The FAO-56 Penman-Monteith model is recommended as the reference model to predict ET0, but its application is commonly restricted by lack of complete meteorological data at many worldwide locations. This study evaluated the potential of machine learning models, particularly four relatively simple tree-based assemble algorithms (i.e. random forest (RF), M5 model tree (M5Tree), gradient boosting decision tree (GBDT) and extreme gradient boosting (XGBoost)), for estimating daily ET0 with limited meteorological data using a K-fold cross-validation method. For assessment of the tree-based models in terms of prediction accuracy, stability and computational costs, these models were further compared with their corresponding support vector machine (SVM) and extreme learning machine (ELM) models. Four input combinations of daily maximum and maximum temperature (Tmax and Tmin), relative humidity (Hr), wind speed (U2), global and extra-terrestrial solar radiation (Rs and Ra) with Tmax, Tmin and Ra as the base dataset were considered using meteorological data during 1961–2010 from eight representative weather stations in different climates of China. The results showed that, when lack of complete meteorological data, the machine learning models using Tmax, Tmin, Hr, U2 and Ra obtained satisfactory ET0 estimates in the temperate continental, mountain plateau and temperate monsoon zones of China (RMSE < 0.5 mm d−1). However, models with three input parameters of Tmax, Tmin and Rs were superior for daily ET0 prediction in the tropical and subtropical zones. The ELM and SVM models offered the best combination of prediction accuracy and stability. The simple tree-based XGBoost and GBDT models showed comparable accuracy and stability to the SVM and ELM models, but exhibited much less computational costs. Considering the complexity level, prediction accuracy, stability and computational costs of the studied models, the XGBoost and GBDT models have been recommended for daily ET0 estimation in different climatic zones of China and maybe elsewhere with similar climates around the world.
Abstract
The estimation of reference evapotranspiration (ET0) is important in hydrology research, irrigation scheduling design and water resources management. This study explored the capability of ...eight machine learning models, i.e., Artificial Neuron Network (ANN), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting (XGBoost), Multivariate Adaptive Regression Spline (MARS), Support Vector Machine (SVM), Extreme Learning Machine and a novel Kernel-based Nonlinear Extension of Arps Decline (KNEA) Model, for modeling monthly mean daily ET0 using only temperature data from local or cross stations. These machine learning models were also compared with the temperature-based Hargreaves–Samani equation. The results indicated that the estimation accuracy of these machine learning models differed in various scenarios. The tree-based models (RF, GBDT and XGBoost) exhibited higher estimation accuracy than the other models in the local application. When the station has only temperature data, the MARS and SVM models were slightly superior to the other models, while the ANN and HS models performed worse than the others. When there was no temperature data at the target station and the data from adjacent stations were used instead, MARS, SVM and KNEA were the suitable models. The results can provide a solution for ET0 estimation in the absence of complete meteorological data.
•We estimated groundwater recharge and ETg under three vegetation covers.•Depth-dependent Sy were determined under rising and falling water table conditions.•Lower forest recharge was due to higher ...interception and reduced recharge capacity.•ETg was controlled by meteorological drivers but mediated by depth to water table.
To evaluate potential hydrological impacts of changes in vegetation over a shallow sandy aquifer in subtropical Australia, we estimated groundwater recharge and discharge by evapotranspiration (ETg) under three vegetation covers. Estimates were obtained over two years (November 2011–October 2013) using the water table fluctuation method and the White method, respectively. Depth-dependent specific yields were determined for estimation of recharge and ETg. Our results show that the average annual gross recharge was largest at the sparse grassland (∼52% of net rainfall), followed by the exotic pine plantation (∼39% of net rainfall) and then the native banksia woodland (∼27% of net rainfall). Lower recharge values at forested sites resulted from higher rainfall interception and reduced storage capacity of the vadose zone due to lower elevations when the water table approaches the soil surface. During 169 rain-free days when the White method was applied, pine trees extracted nearly twice as much groundwater through ETg as the banksia, whereas no groundwater use by grasses was detected. Groundwater use is largely controlled by meteorological drivers but further mediated by depth to water table. The resulting annual net recharge (gross recharge minus ETg) at the pine plantation was comparable to that of the banksia woodland but only half of the corresponding value at the grassland. Vegetation cover impacts potential groundwater recharge and discharge, but in these subtropical shallow water table environments estimates of potential recharge based on rainfall data need to take into account the often limited recharge capacity in the wet season.
Ridge-furrow planting, supplementary irrigation and density regulation are effective measures to improve crop yields in (semi-)arid and drought-prone regions around the world, but their interactive ...effects on crop evapotranspiration, plant nitrogen uptake, grain yield and its components, water productivity, nitrogen use efficiency and economic benefit of winter wheat are not fully understood. Field experiments were carried out on winter wheat in a semi-humid but drought-prone region of northwest China during 2020 − 2021 and 2021 − 2022. There were two planting patterns (P): ridge-furrow planting with plastic film mulching on ridges (RF) and conventional flat planting (F), two supplementary irrigation amounts (I): 30 mm supplementary irrigation at both overwintering and returning-green stages (I60) and no supplementary irrigation (I0), and three planting densities (D): 240 × 104 plants ha−1 (LD), 360 × 104 plants ha−1 (MD) and 480 × 104 plants ha−1 (HD). The results showed that significant interactive effects of P × I, P × D, I × D and P × I × D on grain yield and water productivity of grain yield (WPg) were obtained, and the interactive effects of P × I, I × D and P × I × D on partial factor productivity of nitrogen (NPFP) and net income (NI) were significant. Compared with F, RF increased grain yield by 10.4%, WPg by 11.9%, water productivity of biomass yield (WPb) by 6.2%, precipitation and irrigation water use efficiency (PIUE) by 10.4%, irrigation water use efficiency (IUE) by 12.0%, agronomic nitrogen use efficiency (ANUE) by 5.3%, NPFP by 10.6% and NI by 17.1%. Compared with I0, I60 increased grain yield by 13.5%, WPg by 7.8%, ANUE by 5.7%, NPFP by 13.7% and NI by 27.9%. Increasing planting density improved soil water consumption, population nitrogen uptake, effective number of panicles per unit area, WPb, PIUE, ANUE, NUtE and total income. Grain yield, WPg, NPFP and NI reached the maximum values at MD under RF, because the presence of plastic film in RF reduced the number of planting rows and resulted in excessive planting density in the plant row at HD compared to F. In summary, the optimized planting density (360 ×104 plants ha−1) combined with ridge-furrow planting and supplementary irrigation is more desirable for balancing grain yield, water productivity, nitrogen use efficiency and economic benefit of winter wheat in the semi-humid and drought-prone region of northwest China.
●Planting pattern, supplementary irrigation and planting density had interactive effects on grain yield.●Ridge-furrow planting and supplementary irrigation increased grain yield, water-nitrogen use efficiency and net income.●Grain yield, water-nitrogen use efficiency and net income were maximal at medium density under ridge-furrow planting.●Grain yield, water-nitrogen use efficiency and net benefit were balanced under RFMDI60.
The rapid and nondestructive determination of wheat aboveground biomass (AGB) is important for accurate and efficient agricultural management. In this study, we established a novel hybrid model, ...known as extreme gradient boosting (XGBoost) optimization using the grasshopper optimization algorithm (GOA-XGB), which could accurately determine an ideal combination of vegetation indices (VIs) for simulating wheat AGB. Five multispectral bands of the unmanned aerial vehicle platform and 56 types of VIs obtained based on the five bands were used to drive the new model. The GOA-XGB model was compared with many state-of-the-art models, for example, multiple linear regression (MLR), multilayer perceptron (MLP), gradient boosting decision tree (GBDT), Gaussian process regression (GPR), random forest (RF), support vector machine (SVM), XGBoost, SVM optimization by particle swarm optimization (PSO), SVM optimization by the whale optimization algorithm (WOA), SVM optimization by the GOA (GOA-SVM), XGBoost optimization by PSO, XGBoost optimization by the WOA. The results demonstrated that MLR and GOA-MLR models had poor prediction accuracy for AGB, and the accuracy did not significantly improve when input factors were more than three. Among single-factor-driven machine learning (ML) models, the GPR model had the highest accuracy, followed by the XGBoost model. When the input combinations of multispectral bands and VIs were used, the GOA-XGB model (having 37 input factors) had the highest accuracy, with RMSE = 0.232 kg m−2, R2 = 0.847, MAE = 0.178 kg m−2, and NRMSE = 0.127. When the XGBoost feature selection was used to reduce the input factors to 16, the model accuracy improved further to RMSE = 0.226 kg m−2, R2 = 0.855, MAE = 0.172 kg m−2, and NRMSE = 0.123. Based on the developed model, the average AGB of the plot was 1.49 ± 0.34 kg.