Crop yield prediction is extremely challenging due to its dependence on multiple factors such as crop genotype, environmental factors, management practices, and their interactions. This paper ...presents a deep learning framework using convolutional neural networks (CNNs) and recurrent neural networks (RNNs) for crop yield prediction based on environmental data and management practices. The proposed CNN-RNN model, along with other popular methods such as random forest (RF), deep fully connected neural networks (DFNN), and LASSO, was used to forecast corn and soybean yield across the entire Corn Belt (including 13 states) in the United States for years 2016, 2017, and 2018 using historical data. The new model achieved a root-mean-square-error (RMSE) 9% and 8% of their respective average yields, substantially outperforming all other methods that were tested. The CNN-RNN has three salient features that make it a potentially useful method for other crop yield prediction studies. (1) The CNN-RNN model was designed to capture the time dependencies of environmental factors and the genetic improvement of seeds over time without having their genotype information. (2) The model demonstrated the capability to generalize the yield prediction to untested environments without significant drop in the prediction accuracy. (3) Coupled with the backpropagation method, the model could reveal the extent to which weather conditions, accuracy of weather predictions, soil conditions, and management practices were able to explain the variation in the crop yields.
This study investigates whether coupling crop modeling and machine learning (ML) improves corn yield predictions in the US Corn Belt. The main objectives are to explore whether a hybrid approach ...(crop modeling + ML) would result in better predictions, investigate which combinations of hybrid models provide the most accurate predictions, and determine the features from the crop modeling that are most effective to be integrated with ML for corn yield prediction. Five ML models (linear regression, LASSO, LightGBM, random forest, and XGBoost) and six ensemble models have been designed to address the research question. The results suggest that adding simulation crop model variables (APSIM) as input features to ML models can decrease yield prediction root mean squared error (RMSE) from 7 to 20%. Furthermore, we investigated partial inclusion of APSIM features in the ML prediction models and we found soil moisture related APSIM variables are most influential on the ML predictions followed by crop-related and phenology-related variables. Finally, based on feature importance measure, it has been observed that simulated APSIM average drought stress and average water table depth during the growing season are the most important APSIM inputs to ML. This result indicates that weather information alone is not sufficient and ML models need more hydrological inputs to make improved yield predictions.
The emergence of new technologies to synthesize and analyze big data with high-performance computing has increased our capacity to more accurately predict crop yields. Recent research has shown that ...machine learning (ML) can provide reasonable predictions faster and with higher flexibility compared to simulation crop modeling. However, a single machine learning model can be outperformed by a “committee” of models (machine learning ensembles) that can reduce prediction bias, variance, or both and is able to better capture the underlying distribution of the data. Yet, there are many aspects to be investigated with regard to prediction accuracy, time of the prediction, and scale. The earlier the prediction during the growing season the better, but this has not been thoroughly investigated as previous studies considered all data available to predict yields. This paper provides a machine leaning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa) considering complete and partial in-season weather knowledge. Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions. The forecasts are made in county-level scale and aggregated for agricultural district and state level scales. Results show that the proposed optimized weighted ensemble and the average ensemble are the most precise models with RRMSE of 9.5%. Stacked LASSO makes the least biased predictions (MBE of 53 kg/ha), while other ensemble models also outperformed the base learners in terms of bias. On the contrary, although random k-fold cross-validation is replaced by blocked sequential procedure, it is shown that stacked ensembles perform not as good as weighted ensemble models for time series data sets as they require the data to be non-IID to perform favorably. Comparing our proposed model forecasts with the literature demonstrates the acceptable performance of forecasts made by our proposed ensemble model. Results from the scenario of having partial in-season weather knowledge reveals that decent yield forecasts with RRMSE of 9.2% can be made as early as June 1
st
. Moreover, it was shown that the proposed model performed better than individual models and benchmark ensembles at agricultural district and state-level scales as well as county-level scale. To find the marginal effect of each input feature on the forecasts made by the proposed ensemble model, a methodology is suggested that is the basis for finding feature importance for the ensemble model. The findings suggest that weather features corresponding to weather in weeks 18–24 (May 1
st
to June 1
st
) are the most important input features.
Crop yield prediction is crucial for global food security yet notoriously challenging due to multitudinous factors that jointly determine the yield, including genotype, environment, management, and ...their complex interactions. Integrating the power of optimization, machine learning, and agronomic insight, we present a new predictive model (referred to as the interaction regression model) for crop yield prediction, which has three salient properties. First, it achieved a relative root mean square error of 8% or less in three Midwest states (Illinois, Indiana, and Iowa) in the US for both corn and soybean yield prediction, outperforming state-of-the-art machine learning algorithms. Second, it identified about a dozen environment by management interactions for corn and soybean yield, some of which are consistent with conventional agronomic knowledge whereas some others interactions require additional analysis or experiment to prove or disprove. Third, it quantitatively dissected crop yield into contributions from weather, soil, management, and their interactions, allowing agronomists to pinpoint the factors that favorably or unfavorably affect the yield of a given location under a given weather and management scenario. The most significant contribution of the new prediction model is its capability to produce accurate prediction and explainable insights simultaneously. This was achieved by training the algorithm to select features and interactions that are spatially and temporally robust to balance prediction accuracy for the training data and generalizability to the test data.
We investigate the predictive performance of two novel CNN-DNN machine learning ensemble models in predicting county-level corn yields across the US Corn Belt (12 states). The developed data set is a ...combination of management, environment, and historical corn yields from 1980 to 2019. Two scenarios for ensemble creation are considered: homogenous and heterogenous ensembles. In homogenous ensembles, the base CNN-DNN models are all the same, but they are generated with a bagging procedure to ensure they exhibit a certain level of diversity. Heterogenous ensembles are created from different base CNN-DNN models which share the same architecture but have different hyperparameters. Three types of ensemble creation methods were used to create several ensembles for either of the scenarios: Basic Ensemble Method (BEM), Generalized Ensemble Method (GEM), and stacked generalized ensembles. Results indicated that both designed ensemble types (heterogenous and homogenous) outperform the ensembles created from five individual ML models (linear regression, LASSO, random forest, XGBoost, and LightGBM). Furthermore, by introducing improvements over the heterogenous ensembles, the homogenous ensembles provide the most accurate yield predictions across US Corn Belt states. This model could make 2019 yield predictions with a root mean square error of 866 kg/ha, equivalent to 8.5% relative root mean square and could successfully explain about 77% of the spatio-temporal variation in the corn grain yields. The significant predictive power of this model can be leveraged for designing a reliable tool for corn yield prediction which will in turn assist agronomic decision makers.
•The simulated mean optimum planting date for maize in Iowa, USA corresponds to USDA-NASS 18% planting progress.•The simulated optimum date has advanced by 0.13 days/year from 1980 to 2015.•Climate ...change scenarios affected crop yields much more than the optimum planting dates.•Future investments in planting technologies can buffer climate variability.
Planting date and cultivar selection are major factors in determining the yield potential of any crop and in any region. However, there is a knowledge gap in how climate scenarios affect these choices. To explore this gap, we performed a regional scale analysis (11 planting dates × 8 cultivars × 281 fields × 36 weather years × 6 climate scenarios) using the APSIM model and pSIMS software for Iowa, the leading US maize (Zea mays L.) producing state. Our objectives were to determine how the optimum planting date (optPD) changes with weather scenarios and cultivars and the potential economic implications of planting outside the optimum windows. Results indicated that the mean optPD corresponds to the US Department of Agriculture, National Agriculture Statistics Service (USDA-NASS) 18.4% planting progress (April 28th) in Iowa. The optPD was found to be advancing by –0.13 d yr−1 from 1980 to 2015. A 1 °C increase in mean temperature increased the length of the growing season by 10 days while the optPD changed by –2 to + 6 days, depending on cultivar. Under a more realistic scenario of increasing the minimum temperature by 0.5 °C, decreasing the maximum temperature by 0.5 °C, increasing spring rainfall by 10% and decreasing summer rainfall by 10%, the optPD only changed by –2 days compared to current trends, however, yield increased by 6.6%. Analysis of historical USDA-NASS planting durations indicated that on average, the planting duration (1–99% statewide reported planting progress) is 44 days, while it can be as low as 21 days in years with favorable weather. A simple economic analysis illustrated a potential revenue loss up to $340 million per year by planting maize outside the optimum window. We conclude that future investments in planting technologies to accelerate planting, especially in challenging weather years, as well as improved optPD × cultivar recommendations to farmers, will provide economic benefits and buffer climate variability.
Evidence suggests that global maize yield declines with a warming climate, particularly with extreme heat events. However, the degree to which important maize processes such as biomass growth rate, ...growing season length (GSL) and grain formation are impacted by an increase in temperature is uncertain. Such knowledge is necessary to understand yield responses and develop crop adaptation strategies under warmer climate. Here crop models, satellite observations, survey, and field data were integrated to investigate how high temperature stress influences maize yield in the U.S. Midwest. We showed that both observational evidence and crop model ensemble mean (MEM) suggests the nonlinear sensitivity in yield was driven by the intensified sensitivity of harvest index (HI), but MEM underestimated the warming effects through HI and overstated the effects through GSL. Further analysis showed that the intensified sensitivity in HI mainly results from a greater sensitivity of yield to high temperature stress during the grain filling period, which explained more than half of the yield reduction. When warming effects were decomposed into direct heat stress and indirect water stress (WS), observational data suggest that yield is more reduced by direct heat stress (−4.6 ± 1.0%/°C) than by WS (−1.7 ± 0.65%/°C), whereas MEM gives opposite results. This discrepancy implies that yield reduction by heat stress is underestimated, whereas the yield benefit of increasing atmospheric CO2 might be overestimated in crop models, because elevated CO2 brings yield benefit through water conservation effect but produces limited benefit over heat stress. Our analysis through integrating data and crop models suggests that future adaptation strategies should be targeted at the heat stress during grain formation and changes in agricultural management need to be better accounted for to adequately estimate the effects of heat stress.
Most studies analyzing influences of climatic warming on crop yield have ignored that yield response to temperature is stage dependent. Here we integrate field census data, satellite‐derived data, statistical regressions and mechanistic models to investigate how heat stress nonlinearly influences maize yield and its components (biomass accumulation, phenological development and grain formation). Our analysis through integrating data and crop models suggests that future adaptation strategies should be targeted at the heat stress during grain formation and changes in agricultural management need to be better accounted for to adequately estimate the heat stress effects.
Heat and drought are two emerging climatic threats to the US maize and soybean production, yet their impacts on yields are collectively determined by the magnitude of climate change and rising ...atmospheric CO2 concentrations. This study quantifies the combined and separate impacts of high temperature, heat and drought stresses on the current and future US rainfed maize and soybean production and for the first time characterizes spatial shifts in the relative importance of individual stress. Crop yields are simulated using the Agricultural Production Systems Simulator (APSIM), driven by high‐resolution (12 km) dynamically downscaled climate projections for 1995–2004 and 2085–2094. Results show that maize and soybean yield losses are prominent in the US Midwest by the late 21st century under both Representative Concentration Pathway (RCP) 4.5 and RCP8.5 scenarios, and the magnitude of loss highly depends on the current vulnerability and changes in climate extremes. Elevated atmospheric CO2 partially but not completely offsets the yield gaps caused by climate extremes, and the effect is greater in soybean than in maize. Our simulations suggest that drought will continue to be the largest threat to US rainfed maize production under RCP4.5 and soybean production under both RCP scenarios, whereas high temperature and heat stress take over the dominant stress of drought on maize under RCP8.5. We also reveal that shifts in the geographic distributions of dominant stresses are characterized by the increase in concurrent stresses, especially for the US Midwest. These findings imply the importance of considering heat and drought stresses simultaneously for future agronomic adaptation and mitigation strategies, particularly for breeding programs and crop management. The modeling framework of partitioning the total effects of climate change into individual stress impacts can be applied to the study of other crops and agriculture systems.
This study quantifies the current and future yield responses of US rainfed maize and soybean to climate extremes and for the first time characterizes spatial shifts in the relative importance of high temperature, heat and drought stress.
Abstract
Climate change will drive increased frequencies of extreme climatic events. Despite this, there is little scholarly information on the extent to which waterlogging caused by extreme rainfall ...events will impact on crop physiological behaviour. To improve the ability to reliably model crop growth and development under soil waterlogging stress, we advanced the process-basis of waterlogging in the farming systems model Agricultural Systems Production Systems sIMulator. Our new mathematical description of waterlogging adequately represented waterlogging stress effects on the development, biomass and grain yield of many commercial Australian barley genotypes. We then used the improved model to examine how optimal flowering periods (OFPs, the point at which long-term abiotic stresses are minimal) change under historical and future climates in waterlogging-prone environments, and found that climate change will reduce waterlogging stress and shift forward OFP (26 d earlier on average across locations). For the emissions scenario representative concentration pathway 8.5 at 2090, waterlogging stresses diminished but this was not enough to prevent substantial yield reduction due to increasingly severe high temperature stress (−35% average reduction in yield across locations, genotypes and sowing dates). It was shown that seasonal waterlogging stress patterns under future conditions will be similar to those occurring historically. Yield reduction caused by waterlogging stress was 6% and 4% on average across sites under historical and future climates. To adapt, both genotypic and management adaptations will be required: earlier sowing and planting waterlogging tolerant genotypes mitigate yield penalty caused by waterlogging by up to 26% and 24% under historical and future climates. We conclude that even though the prevalence of waterlogging in future will diminish, climate change and extreme climatic events will have substantial and perverse effects on the productivity and sustainability of Australian farms.
The performance of crop models in simulating various aspects of the cropping system is sensitive to parameter calibration. Parameter estimation is challenging, especially for time-dependent ...parameters such as cultivar parameters with 2-3 years of lifespan. Manual calibration of the parameters is time-consuming, requires expertise, and is prone to error. This research develops a new automated framework to estimate time-dependent parameters for crop models using a parallel Bayesian optimization algorithm. This approach integrates the power of optimization and machine learning with prior agronomic knowledge. To test the proposed time-dependent parameter estimation method, we simulated historical yield increase (from 1985 to 2018) in 25 environments in the US Corn Belt with APSIM. Then we compared yield simulation results and nine parameter estimates from our proposed parallel Bayesian framework, with Bayesian optimization and manual calibration. Results indicated that parameters calibrated using the proposed framework achieved an 11.6% reduction in the prediction error over Bayesian optimization and a 52.1% reduction over manual calibration. We also trained nine machine learning models for yield prediction and found that none of them was able to outperform the proposed method in terms of root mean square error and R
. The most significant contribution of the new automated framework for time-dependent parameter estimation is its capability to find close-to-optimal parameters for the crop model. The proposed approach also produced explainable insight into cultivar traits' trends over 34 years (1985-2018).