•Soybean yield at municipality-level was forecasted using satellite and weather data.•LSTM neural networks outperformed conventional machine learning algorithms in soybean yield prediction.•The model ...accuracy decreased as we anticipated earlier dates of the predictions.•Soybean yield can be forecasted with MAE of 0.42 Mg ha−1 ~70 days before harvesting.
Soybean yield predictions in Brazil are of great interest for market behavior, to drive governmental policies and to increase global food security. In Brazil soybean yield data generally demand various revisions through the following months after harvest suggesting that there is space for improving the accuracy and the time of yield predictions. This study presents a novel model to perform in-season (“near real-time”) soybean yield forecasts in southern Brazil using Long-Short Term Memory (LSTM), Neural Networks, satellite imagery and weather data. The objectives of this study were to: (i) compare the performance of three different algorithms (multivariate OLS linear regression, random forest and LSTM neural networks) for forecasting soybean yield using NDVI, EVI, land surface temperature and precipitation as independent variables, and (ii) evaluate how early (during the soybean growing season) this method is able to forecast yield with reasonable accuracy. Satellite and weather data were masked using a non-crop-specific layer with field boundaries obtained from the Rural Environment Registry that is mandatory for all farmers in Brazil. Main outcomes from this study were: (i) soybean yield forecasts at municipality-scale with a mean absolute error (MAE) of 0.24 Mg ha−1 at DOY 64 (march 5) (ii) a superior performance of the LSTM neural networks relative to the other algorithms for all the forecast dates except DOY 16 where multivariate OLS linear regression provided the best performance, and (iii) model performance (e.g., MAE) for yield forecast decreased when predictions were performed earlier in the season, with MAE increasing from 0.24 Mg ha−1 to 0.42 Mg ha−1 (last values from OLS regression) when forecast timing changed from DOY 64 (March 5) to DOY 16 (January 6). This research portrays the benefits of integrating statistical techniques, remote sensing, weather to field survey data in order to perform more reliable in-season soybean yield forecasts.
•multiple linear regression models were constructed to simulate the yield of the four major crop types in Hungary using environmental and remote sensing information.•positive anomaly of minimum ...temperature in May has a substantial negative effect on the crop yield for all four crops.•the results can be used for early yield forecast and for projection of crop yield to the near future.
In the present study, multiple linear regression models were constructed to simulate the yield of winter wheat, rapeseed, maize and sunflower in Hungary for the 2000–2016 time period. We used meteorological data and soil water content from meteorological reanalysis as predictors of the models in monthly resolution. We included annual fertilizer amount in the analysis to remove trend from the census data. We also used remote sensing based vegetation index to extend the approach for early crop yield forecast purposes and to study the added value of proxy data on the predictive power of the statistical models. Using a stepwise linear regression-like method the most appropriate models were selected based on the statistical evaluation of the model fitting. We provided simple equations with well interpretable coefficients that can estimate crop yield with high accuracy. Cross-validated explained variance were 67% for winter wheat, 76% for rapeseed, 81% for maize and 68.5% for sunflower. The modelling exercise showed that positive anomaly of minimum temperature in May has a substantial negative effect on the final crop yield for all four crops. For winter wheat increasing maximum temperature in May has a beneficial effect, while higher-than-usual vapour pressure deficit in May decreases yield. For maize soil water content in July and August is crucial in terms of the final yield. Incorporation of the vegetation index improved the predictive power of the models at country scale, with 10%, 2% and 4% for winter wheat, rapeseed and maize, respectively. At the county level, remote sensing data improved the overall predictive power of the models only for winter wheat. The results provide simple yet robust models for spatially explicit yield forecast as well as yield projection for the near future.
•Rules to attribute APSIM's yield forecast skill to seasonal climate forecasts (SCF).•Use ECPP and Schaake shuffle to downscale four climate variables suitable for APSIM.•Simulate yield forecasts for ...50 stations in 23 years with 1 to 6 months lead times.•ECPP outperforms QM, raw SCFs and climatology especially for early-season forecasts.•As an early-season yield forecast alternative to bring SCFs closer to agriculture.
Seasonal climate forecasts (SCF) are evolving rapidly alongside improvements in climate modelling and downscaling research, and have great potential for weather-sensitive sectors, especially agriculture, by reducing weather-related risks and increasing productivity. Skilful yield forecasts at the beginning of, or before, a cropping season can provide farmers and other stakeholders in agribusiness with the necessary information for early planning and actions. Only a few yield forecast studies have a forecast lead time of four months or longer due to the problem complexity. To enable SCFs from Global Climate Models (GCMs) to be used for early-season yield forecasts, this paper uses a statistical downscaling technique, Extended Copula Post-Processing (ECPP) and the Schaake shuffle, to downscale four climate variables to generate weather-like daily data that are suitable for agricultural applications. Climate forecasts drive a process-based crop model APSIM (Agricultural Production Systems sIMulator) to simulate crop forecasts on 50 stations, well-distributed across the Australian grain zone. To focus on yield forecast skills attributable to SCF, we propose best practice management rules to predict water-limited winter wheat yield. Yield forecasts from ECPP have a significant improvement over quantile mapping downscaling and raw SCF from the Australian recent seasonal forecast model ACCESS-S1 in terms of bias, accuracy, reliability, and overall forecast skill. In addition, even at the beginning of a cropping season with a forecast lead time of four or more months, yield forecasts driven by ECPP illustrate higher skill than climatology, a benchmark for yield forecast. Early-season yield forecasts driven by SCFs provides a promising alternative to regression/machine-learning-based forecasts. Performance sensitivity and issues, and gaps on using skilful SCFs to help growers with their farming decision-making are discussed.
•Uncertainty predicting zero-N maize yield varied between 2.0 to 2.5 Mg ha−1.•Previous crop, irrigation, and soil organic matter were the most relevant predictors.•Spring weather provided key ...information for refining the yield prediction.
Maize (Zea Mays L.) yield responsiveness to nitrogen (N) fertilization depends on the yield under non-limiting N supply as well as on the inherent productivity under zero N fertilizer (Y0). Understanding the driving factors and developing predictive algorithms for Y0 will enhance the optimization of N fertilization in maize. Using a random forest algorithm, we analyzed data from 679 maize N fertilization studies (1031 Y0 observations) conducted between 1999–2019 in the United States and Canada. Predictability of Y0 was assessed while identifying determinant factors such as soil, crop management, and weather. The inclusion of weather variables as predictors improved the model efficiency (ME) from 51 up to 64 %, and reduced the root mean square error (RMSE) from 2.5 to 2.0 Mg ha−1, 34 to 27 % in relative terms (RRMSE). The most relevant predictors of Y0 were previous crop, irrigation, and soil organic matter (SOM), while the most influential weather data was linked to the radiation per unit of thermal time (Q quotient) around flowering and spring precipitations. The crop rotation effect resulted in Alfalfa (Medicago sativa L.) as the previous crop with the highest Y0 level (IQR = 11.5–15.0 Mg ha−1) as compared to annual legumes (IQR = 5.6–10.0 Mg ha−1) and other previous crops (IQR = 3.6–7.8 Mg ha−1). The Q quotient around flowering positively affected Y0, while spring precipitations and extreme temperature events during grain filling showed a negative association to Y0. Overall, these results reinforce the concept that yields are controlled not only by soil N supply but also by factors modifying plant demand and ability to capture N. Lastly, we foresee a promising future for the use of machine learning to address both prediction and interpretation of maize yield to obtain more reliable N guidelines.
Display omitted
•Achieve large-scale and small-scale cotton yield prediction.•Using deep learning to extract cotton bolls which can improve the predict accuracy.•Generate high-resolution yield map ...according to the model.
Crop yield prediction is of great practical significance for farmers to make reasonable decisions, such as decisions on crop insurance, storage demand, cash flow budget, fertilizer, water and other input factors. The traditional yield measurement method is sampling surveys, which require a large area of destructive sampling of cotton fields and consume considerable time and labor costs. This study established a cotton yield estimation model based on time series Unmanned Aerial Vehicle (UAV) remote sensing data. The U-Net semantic segmentation network is used to recognize and extract the boll opening pixels in high-resolution visible images, and the boll opening pixel percentage (BOP) is calculated according to the network extraction results. By combining the multispectral images and the pixel coverage of cotton bolls, a Bayesian regularization BP (back propagation) neural network was used to predict cotton yields. In order to simplify the input parameters of the model, the stepwise sensitivity analysis method is used to eliminate redundant variables and obtain the optimal input feature set. The experimental results show that the R2 of the proposed model is 0.853 at the scale of 0.81 m2 (average results of ten-fold cross validation). This study provides a method that can simultaneously meet the requirements of large-area and small-scale forecasting of cotton yields and provides a new idea for cotton yield measurement and breeding screening.
Early and reliable seasonal crop yield forecasts are crucial for both farmers and decision-makers. Commonly-used methods for seasonal yield forecasting are based on process-based crop models or ...statistical regression-based models. Both have limitations, particularly in regard to accounting for growth stage-specific climate extremes (such as drought, heat, and frost). In this study, we firstly developed a hybrid yield forecasting approach by blending of multiple growth stage-specific indicators, i.e. APSIM (a process-based crop model)-simulated biomass, and climate extremes, NDVI (Normalized Difference Vegetation Index), and SPEI (Standardized Precipitation and Evapotranspiration Index) before forecasting dates, using a regression model (random forest or multiple linear regression). Plot-scale wheat yield (2008–2017) in the southeastern Australian wheat belt was dynamically forecasted at the end of several targeted growth stages as the growing season progressed to harvest. Results showed that the forecasting accuracy increased significantly for both systems as forecast time approached harvest time. The forecasting system based on random forest outperformed the forecasting system based on multiple linear regression at each forecasting event. Satisfactory yield forecasts occurred at one month (~35 days) prior to harvest (r = 0.85, LCCC = 0.81, MAPE = 17.6%, RMSE = 0.70 t ha−1, and ROC score = 0.90), and at two months before harvest (r = 0.62, LCCC = 0.53, MAPE = 27.1%, RMSE = 1.01 t ha−1, and ROC score = 0.88). In addition, drought events throughout the growing season were identified as the main factor causing yield losses in the wheat belt during the past decade. With the increasing availability of farming-related data, we expect that the yield forecasting system proposed in our study may be widely extended to other comparable cropping regions to produce sufficiently accurate wheat yield forecasts for stakeholders to develop strategic decisions in their respective roles.
•Monthly VPD and precipitation in spline form, combined with EVI, give the best prediction model.•Model’s performance shows regional and interannual variations, which are related to spatial and ...temporal yield variability.•Model’s prediction shows increasingly larger RMSE toward wetter years and extremely dry years.•Inconsistent model evaluation practices undermine the comparability between statistical modeling studies.
Statistical crop models have been a major tool in identifying critical drivers of crop yield, forecasting short-term crop yield, and assessing long-term climate change impacts on agricultural productivity. However, few studies focus specifically on fundamental issues encountered in developing a high-performance statistical crop model for yield prediction. Such issues include: how to select predictors and fitting functions, how to effectively address the spatiotemporal scale issue, weather it is beneficial to include satellite data as explanatory variables, and how to reconcile different model evaluation procedures. In this study, we present our statistical modeling practices for predicting rainfed corn yield in the Midwest U.S. and address the aforementioned issues through comprehensive diagnostic analysis. Our results show that vapor pressure deficit and precipitation at a monthly scale, in spline form with customized knots, define the “Best Climate-only” model among alternative climate variables (e.g., air temperature) and fitting functions (e.g., linear or polynomial), with an out-of-sample (leave-one-year-out) median R2 of 0.79 and RMSE of 1.04 t/ha (16.6 bu/acre) from 2003 to 2016. Satellite variables, such as MODIS land surface temperature and Enhanced Vegetation Index (EVI), when used as predictors alone, reduce the model’s RMSE to 0.93 t/ha (14.8 bu/acre). Adding satellite variables (i.e., EVI in polynomial form) to the “Best Climate-only” model gives the “Best Climate + EVI” model, which has the highest prediction performance of this study, with a median R2 of 0.85 and RMSE of 0.90 t/ha (14.3 bu/acre). Such a model trained using all data (so-called “global model”) in most cases leads to better predictions than the state-specific trained models. However, the global model’s prediction performance exhibits considerable regional and interannual variations. The regional-varying performance is related to states’ spatiotemporal variability in yield, where states with larger spatial yield variability show higher R2, and states with smaller temporal yield variability show lower RMSE. Interannual variations in prediction performance are linked to yield variability and degree of wetness, with higher R2 in years with larger yield variability but increasingly larger RMSE toward wetter years and extreme dry years. These identified spatial and temporal variations of model’s performance, together with inconsistent evaluation practices undermine the comparability between statistical modeling studies. Alleviating such comparability issues requires more transparency and open data practices. The statistical model presented in this study provides a benchmark for further development and can be applied to future research related to yield prediction or assessment of climate change impact.
•Develop a within-growing season yield forecast system with random forest model.•Random forest model performs well in predicting grain yield in China.•We identified the most important stage-specific ...predictors determining crop yield.•The most important variable influencing yields varied with crop types.
Accurate and timely crop yield forecasts can provide essential information to make conclusive agricultural policies and to conduct investments. Recent studies have used different machine learning techniques to develop such yield forecast systems for single crops at regional scales. However, no study has used multiple sources of environmental predictors (climate, soil, and vegetation) to forecast yields for three major crops in China. In this study, we adopted 7-year observed crop yield data (2013–2019) for three major grain crops (wheat, maize, and rice) across China, and three major data sets including climate, vegetation indices, and soil properties were used to develop a dynamic yield forecasting system based on the random forest (RF) model. The RF model showed good performance for estimating yields of all three crops with correlation coefficient (r) higher than 0.75 and normalized root means square errors (nRMSE) lower than 18.0%. Our results also showed that crop yields can be satisfactorily forecasted at one to three months prior to harvest. The optimum lead time for yield forecasting depended on crop types. In addition, we found the major predictors influencing crop yield varied between crops. In general, solar radiation and vegetation indices (especially during jointing to milk development stages) were identified as the main predictor for winter wheat; vegetation indices (throughout the growing season) and drought (especially during emergence to tasseling stages) were the most important predictors for spring maize; soil moisture (throughout the growing season) was the dominant predictor for summer maize, late rice, and mid rice; precipitation (especially during booting to heading stages) was the main predictor for early rice. Our study provides insights into practical crop yield forecasting and the understanding of yield response to environmental conditions at a large scale across China. The methods undertaken in this research can be easily implemented in other countries with available information on climate, soil, and vegetation conditions.
•Maize yield was forecasted on each day by considering actual weather data.•High accuracy of yield prediction could be achieved after maize tasseling.•Decline in daily forecasted yields resulted from ...more serious water stress.•This algorithm could quantify the impacts of real-time weather on seasonal yield.•Simulated IUEs were improved with irrigation schemes scheduled by this method.
Current water consumptions are unsustainable in many regions, which requiring more efficient agricultural water management strategies. This study incorporated the DSSAT-CERES-Maize model with a new algorithm for dynamic within-season irrigation scheduling for maize (Zea mays L.) based on trends in daily forecasted yields. Field experiments were undertaken at four arid and semiarid sites in Northwest China, including Changwu (2010 and 2011, rainfed), Yangling (2014 and 2015, irrigated), Jingyang (2015, irrigated), and Shiyanghe (2015, irrigated). Historical 50-year (1968–2017) weather data were available for each site. In daily yield forecasts, weather data before forecast dates were observed from local weather stations, while the unknown data between forecast and harvest dates were supplemented by local 50-year continuous weather series in the same periods. Then 50 maize yields could be obtained on each forecast day, and the median values were calculated as the prediction on that day. As the growing season advanced, historical weather data were gradually replaced by actual weather data. Further, the dynamics of daily forecasted yields were used to schedule irrigation based on a new algorithm. The new algorithm schedule irrigations by considering the feedbacks of maize grain yield to interactions of actual weather, environment, and management. The results showed that forecasted maize yield had considerable uncertainty before tasseling but rapidly converged to the actual yield about one month before harvest. The mean absolute relative errors (MAREs) of daily forecasted yields were 11.7% and 7.3% at Changwu in 2010 and 2011, respectively. Simulated irrigation use efficiency (IUE) for almost all sites and years were improved. The new irrigation scheduling algorithm will help to improve irrigation scheduling in arid and semiarid areas where precipitation is the main limited factor to maize yield.
Accurate in-season yield forecasts for field-scale crops are crucial for both farmers and decision-makers. Common methods for yield prediction are limited by the availability of unknown weather data ...(process-based crop models) and the failure to consider yield formation processes (statistical models based on unmanned aerial vehicle (UAV) images), respectively. Furthermore, previous studies focused only on crops without mulching, yet mulching is an important agronomic approach to increase grain yield in the arid areas of northwest China. We aim to develop a hybrid approach coupling crop model and UAV data through ensemble learning to achieve in-season yield forecasts for film-mulched wheat. A four-year field experiment was constructed (2018–2020 and 2021–2023). We first calibrated AquaCrop using data from 2018 to 2020, and historical weather data were employed to drive AquaCrop for predicting yields in 2021–2023. Next, statistical models were constructed to predict yields based on spectral and textural indices calculated from UAV images. Finally, a hybrid approach coupling the AquaCrop model and remote-sensing data was developed using ensemble learning technique. Quantifying the relative contribution of features used SHapley Additive exPlanations values. The results indicated that AquaCrop yield forecasts exhibited considerable uncertainties (R2: 0.53–0.63; NRMSE: 16.54%–14.83%). The interpretation of yield for remote-sensing data was influenced by background and saturation effects, reaching its highest accuracy at the heading stage (R2 was 0.80, NRMSE was 11.88%). Ensemble learning demonstrated strong performance compared to machine learning algorithms. The coupling model combined the advantages of crop and statistical models by the ensemble learning algorithm, achieving accurate yield predictions more than 40 days before harvest (heading stage) based on AdaBoost regression (R2 was 0.88, NRMSE was 8.40%). The most important forecasting factors affecting yield prediction were the textural indices, followed by the AquaCrop simulated values. Overall, the coupled model showed good performance in predicting the in-season yield of film-mulched wheat, which provided new insights into farm-scale yield prediction. Further validation of the generalizability of the coupled model in different scenarios is required in the future to improve the applicability of the model in actual production practice.
•AquaCrop model and UAV data were coupled by the ensemble learning.•AdaBoost regression in Boosting was the optimal yield prediction algorithm.•TIs and AquaCrop simulated values were all important yield forecasting factors.