Accurate prediction of crop yield and dry matter as well as optimized water and nitrogen management can favor rational decision-making for farming systems. Combining high-performance computing with ...innovative technologies of big data processing, machine learning (ML) advances data-intensive science and provides an important supporting frame for crop yield prediction. This paper evaluated the performance of five ML algorithms, including linear regression (LR), decision tree (DT), support vector machine (SVM), ensemble learning (EL), and Gaussian process regression (GPR), for winter wheat (Triticum aestivum L.) yield and dry matter prediction using data collected from previous studies conducted within the last twenty years in the North China Plain (NCP). In addition, winter wheat yield and dry matter were explored using the best algorithm, while polynomial functions were proposed that could describe the relationship of water and nitrogen application with winter wheat yield and dry matter. Results confirmed that the GPR model outperformed all other models for predicting the yield (R2 = 0.87) and dry matter (R2 = 0.86) of winter wheat. The prediction errors of the GPR model for maximum yield and dry matter of winter wheat were 5.8 % and 1.1 %, respectively. The yield and dry matter of winter wheat in the NCP could be predicted by the GPR model and polynomial functions, and the optimal water and nitrogen application for maximum yield and dry matter could be obtained. The results provide insight into site-specific crop management.
•Application of five machine learning algorithms to predict winter wheat yield and dry matter.•The Gaussian process regression algorithm outperformed the other algorithms.•Water and nitrogen coupling functions for winter wheat yield and dry matter were developed.
Accurate in-season yield forecasts for field-scale crops are crucial for both farmers and decision-makers. Common methods for yield prediction are limited by the availability of unknown weather data ...(process-based crop models) and the failure to consider yield formation processes (statistical models based on unmanned aerial vehicle (UAV) images), respectively. Furthermore, previous studies focused only on crops without mulching, yet mulching is an important agronomic approach to increase grain yield in the arid areas of northwest China. We aim to develop a hybrid approach coupling crop model and UAV data through ensemble learning to achieve in-season yield forecasts for film-mulched wheat. A four-year field experiment was constructed (2018–2020 and 2021–2023). We first calibrated AquaCrop using data from 2018 to 2020, and historical weather data were employed to drive AquaCrop for predicting yields in 2021–2023. Next, statistical models were constructed to predict yields based on spectral and textural indices calculated from UAV images. Finally, a hybrid approach coupling the AquaCrop model and remote-sensing data was developed using ensemble learning technique. Quantifying the relative contribution of features used SHapley Additive exPlanations values. The results indicated that AquaCrop yield forecasts exhibited considerable uncertainties (R2: 0.53–0.63; NRMSE: 16.54%–14.83%). The interpretation of yield for remote-sensing data was influenced by background and saturation effects, reaching its highest accuracy at the heading stage (R2 was 0.80, NRMSE was 11.88%). Ensemble learning demonstrated strong performance compared to machine learning algorithms. The coupling model combined the advantages of crop and statistical models by the ensemble learning algorithm, achieving accurate yield predictions more than 40 days before harvest (heading stage) based on AdaBoost regression (R2 was 0.88, NRMSE was 8.40%). The most important forecasting factors affecting yield prediction were the textural indices, followed by the AquaCrop simulated values. Overall, the coupled model showed good performance in predicting the in-season yield of film-mulched wheat, which provided new insights into farm-scale yield prediction. Further validation of the generalizability of the coupled model in different scenarios is required in the future to improve the applicability of the model in actual production practice.
•AquaCrop model and UAV data were coupled by the ensemble learning.•AdaBoost regression in Boosting was the optimal yield prediction algorithm.•TIs and AquaCrop simulated values were all important yield forecasting factors.
Accurate crop yield predictions play a crucial role in enabling informed policy-making to ensure food security. Beyond using advanced methods such as remote sensing and data assimilation (DA), it is ...essential to comprehend the influence of various sources of uncertainty on the overall prediction uncertainty. This study presents a novel approach for enhancing the accuracy of crop yield predictions by assimilating remotely-sensed Leaf Area Index (LAI) and updating weather ensemble data into a crop model (SPASS) while accounting for calibration and weather uncertainty. In addition, we investigated the effect of model calibration prior to DA by four calibration data type scenarios. These scenarios involve calibrating the crop model to different combinations of yield, phenology, and LAI, ranging from minimum (yield only) to maximum (yield, phenology, and LAI) data availability. To address weather uncertainty, we derived weather forecasts downscaled from climate models utilizing the MarkSim weather generator. Our results demonstrate that the assimilation LAI and updating weather data significantly reduces the overall uncertainty in crop yield predictions. Notably, the uncertainty associated with weather ensembles has a more substantial influence compared to the uncertainty resulting from calibration. This finding highlights the significance of accounting for variations and discrepancies in weather predictions when assessing yield uncertainty. Additionally, given the set of SPASS model parameters used for winter wheat calibration, additional field-based LAI data does not improve the calibration quality.
•Assimilating LAI and weather data improves yield predictions by the SPASS model.•Calibrating SPASS to phenology and yield data adequately primes it for assimilation.•SPASS calibration to LAI does not enhance the efficacy of LAI assimilation.•Bias correction in weather data during assimilation markedly reduces yield prediction bias.•Weather uncertainty outweighs parameter uncertainty in yield predictions.
Wheat production in Kazakhstan is fundamentally contributing to food security in Central Asia and beyond. It gained even more importance after recent spikes in global food prices in 2022. Therefore, ...timely and reliable estimates of Kazakh wheat production are important for food security planning and management. In this study, we developed a statistical weather-driven crop model that can successfully hindcast wheat yields at the oblast level up to two months before the harvest. The hindcast of wheat yields for 1993–2021 produces a median R2 of 0.69 for the full model run and R2 values of 0.60 and 0.37 for two levels of out-of-sample validations, respectively. Based on these yield estimates we provide a robust hindcast of the total wheat production for Kazakhstan with R2 values between 0.86 and 0.73. We forecast total wheat production in Kazakhstan for 2022 to be 12.4 million tonnes and the average yield to be 0.96 tonnes per hectare, which is 5 % above the production and yield of 2021 (assuming equal areas). The statistical model is run with publicly available weather and yield data and requires low computational power, making it easily replicable. The forecast model can be used as a replenishment to currently applied forecasting methods supporting countries in Central Asia to meet their food demand.
•Constructed a statistical yield model for wheat in Kazakhstan to forecast production.•Precipitation showed high influence on recent wheat yields.•Skilful wheat production hindcast for Kazakhstan with R2 values between 0.86 and 0.73.•Wheat yield and production forecast for 2022 on oblast level.
Forecasting crop production a few weeks before harvest is of strategical interest for the cooperatives which collect, store and market grains. The recent development of Sentinel satellites opened new ...avenues for yield forecasting at field and farm level, thanks to their operational spatial resolution and revisiting time. In this study, we combined remote sensing data (in-season green area index, GAI) and statistical modeling to forecast sunflower yield at field level for a range of cultivars and crop practices over different small production areas and years in southwestern France. From 2014–2016, 359 sunflower fields were monitored throughout the growing season in the ‘Haute-Garonne’ and ‘Gers’ administrative departments (SW France). From the satellite GAI estimates, two variables were calculated: GAImax (maximum GAI, between F1 stage and F1 + 10 days) and GAD (Green Area Duration). Different statistical modeling procedures were tested namely a linear regression (LR), a second degree polynomial regression (PR), a random forest regressor (RF) and a Gaussian process (GP). In each case, the models were tested using either GAD, GAImax or both variables, and each model was trained using in a first time GAD and GAImax obtained with linear interpolation, and in a second time, the same variables computed using the double sigmoid interpolation. In a perspective of yield prediction, GAD was calculated from anthesis to maturity but also from anthesis to 10/07, 20/07, 30/07 and 10/08 using remote sensing data. Sunflower grain yield at maturity was predicted with 10 models differing by their forms and the agronomic variables involved. At individual level, GY was slightly better predicted by models including GAD + GAImax or GAD, while models based only on GAImax were the less accurate. This was consistent with the major importance of post-anthesis radiation interception and senescence dynamics in the development of grain yield in sunflower. Better predictions were achieved in 2014, then in 2015 and finally 2016. However, at the grain catchment area level, PR models including GAD were the most accurate ones with absolute errors ranging from 0.53 to 4.68 q.ha−1 as a function of years. Only the predictions obtained with 2014 data and over the 3 years were sufficiently accurate to be of operational value for a cooperative manager.
Display omitted
•359 sunflower fields were surveyed in the Toulouse region from 2014 to 2016.•Remote sensing data and statistical modeling were combined to forecast sunflower grain yield at field level.•Green Area Index (GAI) at anthesis and post-anthesis Green Area Duration (GAD) are two predictive variables of grain yield in sunflower.•10 models differing by their forms and the agronomic variables involved were tested.•Polynomial regression with GAD was the most accurate model at grain catchment area.
•Wheat and maize yield anomalies for the Pannonian basin are forecasted using XGBoost.•Maize yield anomalies can be forecasted accurately two months before harvest.•Impact of severe droughts on crop ...yield losses remains underestimated.•Forecasting of temporal yield variability is more reliable than spatial variability.•Soil moisture is the most important predictor for maize yields in drought years.
The increasing frequency and intensity of severe droughts over recent decades have led to substantial crop yield losses in the Pannonian Basin in southeastern Europe. Their socioeconomic consequences can be minimized by accurate crop yield forecasts, but such forecasts often underestimate the impact of severe droughts on crop yields. We developed a gradient-boosting-based crop yield anomaly forecasting system for the Pannonian Basin and examined its performance, with a focus on drought years. Winter wheat and maize yield anomalies are forecasted for 42 regions in the Pannonian Basin using predictor datasets from Earth observation and reanalysis describing vegetation state, weather, and soil moisture conditions.
Our results show that crop yield anomaly estimates in the two months preceding harvest have better performance (maize errors 14–17%, wheat 13–14%) than earlier in the year (maize errors 21%, wheat 17%). The forecast models can satisfactorily capture the interannual yield anomalies, but spatial yield variability is only partially reproduced. In years of severe drought, the wheat model performs better than under average conditions with errors below 12%. The errors of the maize forecasts in drought years are larger than average forecast skill: 31% two months ahead and 20% one month ahead. However, for both crops the yield losses remain underestimated by the forecasts in severe drought years. The feature importance analysis shows that during the last two months before harvest, wheat yield anomalies are controlled by temperature and evaporation and maize by the combined effects of temperature and water availability as expressed by several drought indices. In severe drought years, during the two months before harvest the seasonal temperature forecast becomes the most important predictor for the wheat forecasts and soil moisture for the maize model. Overall, this study provides in-depth insights into the impact of droughts on crop yield forecasts in the Pannonian Basin.
Improving crop yield prediction accuracy is crucial for sustainable agriculture. One approach is to use data assimilation (DA) techniques based on satellite remote sensing, which can help improve ...predictions at the regional to national scale. However, the interaction between uncertain crop model inputs and DA, as well as the impact of crop model structure on DA results, have received little attention to date. In this work, we assimilated leaf area index (LAI) data into three single crop models (CERES, GECROS, and SPASS) as well as into their multi-model ensemble (MME) using a particle filtering (PF) algorithm. Mimicking the common lack of information at a large scale, we considered nitrogen fertilization, sowing date, soil hydraulic parameters, and weather data as the sources of uncertainties. In a case study, we applied this setup to six winter wheat site years in southwestern Germany. Before applying DA, all models were calibrated and validated using in-situ measured data from a multi-site, multi-year independent data set. The model performance in the calibration was used to assign weights to the models of the MME. Results show that weather data and soil hydraulic parameters had the highest impact on all model predictions. DA substantially improved the accuracy and precision of LAI simulation in all models. Moreover, DA enhanced grain yield prediction by GECROS, SPASS, and the multi-model ensemble, but had no considerable effect on CERES. Specifically, the bias in yield prediction decreased from 25% to 15% in the case of GECROS, from 26% to 15% in SPASS, and from 19% to 7% in the MME. In contrast, even without DA, the yield prediction error in CERES was below 5%. The correlation between LAI errors and yield errors was a key factor indicating how DA can be effective on a specific model. When the correlation analysis is unavailable, the multi-model ensemble is a promising approach for data assimilation. Further investigations on regional model calibration, input uncertainty, MME size, and model weighting scheme are necessary to improve the performance of data assimilation applications.
•Weather uncertainty's bigger impact on yield than management/soil.•Weather data uncertainty can bias yield predictions.•Data assimilation reduces input-driven yield prediction uncertainty.•Multi-model ensemble enhances yield predictive power in data assimilation.•Correlation between assimilated variable error and yield error key in assimilation.
•Forecasting Brazilian wheat yield two months before harvest with < 8% RMSE.•Using monthly temperature and rainfall features from four locations from Aug-Oct.•Employing forecasted features from ...seasonal climate models.•Comparing the approach with features from climatology and multi-model-ensembles.•Providing a transparent, reproducible, and data-inexpensive approach.
National wheat yield depends on climate conditions and usually remains unknown until harvest. In-season knowledge can be provided by wheat yield forecast systems, supporting the decision-making of farmers, food traders, or policymakers. In this study, we improved a previously developed statistical wheat yield model to forecast trend-corrected wheat yield in Brazil with monthly temperature and precipitation data from seasonal climate models (SCM) from the last three months before harvest. We chose SCM from the European Center for Medium-Range Weather Forecasts (ECMWF), the National Centers for Environmental Prediction (NCEP), and the UK-based Met-Office (UKMO). A multi-model ensembles (MME) approach from the three individual models as well as a climatology (CLIMATE) approach were also tested. Wheat yield forecasts were issued at the beginning of each month from planting in April to harvest in November. Each month, features from future months are forecasted by SCM, and past features are supplemented with observations from weather stations. Our approach shows a 12% RMSE in forecasting yield early in the season, from April to June. Forecasts start to improve from July onwards, with shorter lead times and including observed features from September onwards. At the beginning of October, about two months before harvest is completed, wheat yield can be forecasted with 7.6%, 7.9%, 7.9%, 9.1%, and 9.3% RMSE using climate data from UKMO, ECMWF, MME, NCEP, and CLIMATE respectively. Seasonal climate models can be useful tools to forecast national wheat yield, even shortly before harvest to prepare for possible food shortages. Our approach could be applied to other staple crops and regions.
•The strengths and weaknesses of DSSAT-CERES-Maize and AquaCrop were identified.•DSSAT-CERES-Maize with the two evapotranspiration (ET) options was tested.•DSSAT-CERES-Maize was inferior to handle ET ...simulation with drought than AquaCrop.•DSSAT-CERES-Maize showed greater potential to simulated maize yield than AquaCrop.•Both crop models had greater uncertainties under severe drought conditions.
As water scarcity becomes more acute in many parts of the world, crop modeling tools that effectively simulating crop response to deficit irrigation strategies to help investigate management improvement are needed. Identifying the strengths and weaknesses of the crop models with different growth-engines are therefore of great importance. The objective of this study was to investigate the capability and improvements of the new version of a solar energy-driven crop model (DSSAT-CERES-Maize, v4.7.5.0) in simulating water consumption and yield of hybrid seed maize under different soil water conditions, and its comparison with a water-driven crop model (AquaCrop, v4.0). Data obtained from a 4-year (2012–2015) field trial on maize grown under different irrigation treatments at Wuwei, Northwest China, was used for this assessment. These models were calibrated and validated using measured daily evapotranspiration (ET), leaf area index (LAI), aboveground biomass, yield (Y), harvest index (HI) and soil water content (SWC). Daily ET was measured using a combination of an eddy covariance (EC) system, sap flow sensors, and micro-lysimeter cylinders. The ability of DSSAT-CERES-Maize using the two different ET options, i.e., Priestley-Taylor/Ritchie (PT) and FAO-56 Penman-Monteith (PM) was analyzed. The results showed that DSSAT-CERES-Maize with the PT approach had fair agreement with measured daily ET of maize under non-water stress condition (R2=0.85; NRMSE=26.7%), but poor agreement with ET under water stress conditions (R2=0.51; NRMSE=43.8%). DSSAT-CERES-Maize with the PM approach systematically underestimated ET by up to 13% under non-water stress condition, which was mainly attributed to that the maximum static CERES-Maize crop coefficient (EORATIO) was currently hard coded to 1.0. Using the PT or FAO-56 PM approach as ET input in DSSAT-CERES-Maize showed no different effect on final biomass (B) and Y simulation for full irrigation. But for water stress conditions, DSSAT-CERES-Maize with the FAO-56 PM approach simulated B and Y with higher overestimation to the measured data than those simulated using the PT approach. The simulated LAI, biomass and SWC by DSSAT-CERES-Maize using the PT approach generally well followed the trend of the measured values for most irrigation treatments. The model with the PT approach showed acceptable prediction for B and Y of different irrigation treatments across years, with NRMSE of 15.5% and 26.2%, respectively, but the accuracy decreased with an aggravation of water stress. Furthermore, the strengths and weaknesses of DSSAT-CERES-Maize and AquaCrop, and their different cores of growth-engines, i.e. RUE and normalized water productivity (WP*) were carefully discussed. It was concluded that DSSAT-CERES-Maize was a superior estimate of maize yield than was AquaCrop, especially when the climate varied dramatically between years. But DSSAT-CERES-Maize, for the simulation of maize water consumption in an arid region where drought often occurs, was inferior to AquaCrop. These results contribute to recommend the appropriate crop model for specific modeling goals.
Display omitted
•Soil health parameter estimation using simple regression techniques.•Soil parameter-based wheat crop yield estimation.•Cost-effective study using freely available satellite ...data.•Different algorithms compared.
In recent years, Deep Learning Multi-Layer Perceptron (DLMLP) neural networks have shown remarkable success in addressing crop yield forecast related problems. The methodologies used so far for crop yield forecast with remotely sensed data were focused upon vegetation indices generated from optical data. The prediction of crop yield in an accurate manner by developing robust machine learning models based on soil health parameters is crucial since it helps keep a track of soil health as well as its impact on overall yield. This study aims to utilize remotely sensed Microwave satellite data from Sentinel-1 and optical data from Sentinel-2, and field data to estimate three important soil health parameters- Soil Moisture, Soil Salinity, and Soil Organic Carbon (SOC). The study has been carried out in the Rupnagar district of Punjab in India. The estimated soil health parameters, SAR backscatter, and optical remote sensing satellite data parameters were utilized to estimate wheat crop yield. The soil health based DLMLP model performed best in crop yield estimation and gave R2 values of 0.723 and 0.684 in the training and testing phases, respectively, and Mean Absolute Error (MAE) of 0.98 and Root Mean Square Error (RMSE) value of 1.24 for the 2019–20 season. The DLMLP test R2 was 42.2% more than the Ordinary Least Squares Regressor (OLS), while the MAE and RMSE were 37.97% and 38.61% less than the OLS regressor for wheat crop yield estimation. The soil health-based DLMLP model gave satisfactory yield estimation accuracy in the absence of validation of soil health parameter values for the preceding years-2015–16 till 2018–19 wheat seasons. This study's novel feature is that it estimates soil health parameters for the early stages of wheat crop growth when soil lies mostly exposed and utilises them for crop yield prediction.