This study investigates whether coupling crop modeling and machine learning (ML) improves corn yield predictions in the US Corn Belt. The main objectives are to explore whether a hybrid approach ...(crop modeling + ML) would result in better predictions, investigate which combinations of hybrid models provide the most accurate predictions, and determine the features from the crop modeling that are most effective to be integrated with ML for corn yield prediction. Five ML models (linear regression, LASSO, LightGBM, random forest, and XGBoost) and six ensemble models have been designed to address the research question. The results suggest that adding simulation crop model variables (APSIM) as input features to ML models can decrease yield prediction root mean squared error (RMSE) from 7 to 20%. Furthermore, we investigated partial inclusion of APSIM features in the ML prediction models and we found soil moisture related APSIM variables are most influential on the ML predictions followed by crop-related and phenology-related variables. Finally, based on feature importance measure, it has been observed that simulated APSIM average drought stress and average water table depth during the growing season are the most important APSIM inputs to ML. This result indicates that weather information alone is not sufficient and ML models need more hydrological inputs to make improved yield predictions.
Manufacturers provide products that have distinct green levels (i.e. higher degree of environmental friendliness) to satisfy consumer demands with different green preferences. A product with a higher ...green level generate fewer emissions but have higher costs. To encourage those manufacturers to produce environmentally friendly products, a government can implement subsidy policies. This paper focuses on the decision-making problem faced by manufacturers to determine which levels of green products to produce and production quantities at each green level. We develop an optimization model under oligopolistic competition considering green preferences and subsidies, with the objective of profit maximization for the manufacturers. We prove the existence and uniqueness of equilibrium and propose a converged algorithm with theory of finite dimensional variational inequality. Numerical results show that an increase of consumer environmental awareness will incentivize manufacturers to produce more green products with higher green levels, but this does not necessarily lead to higher profits for the manufacturers. Moreover, a well-designed subsidy policy can not only generate more profits for manufacturers, but also save subsidy investment for the government. In addition, with the changes of consumer environmental awareness or/and subsidy policy, manufacturers may obtain more profits even if the competition is more fierce.
•We proposed a multi-stage stochastic lot-sizing and scheduling model.•We identified number of scenarios to balance solution quality and computation time.•Big EVPI and VSS values indicate the ...importance of considering uncertainty.•The multi-stage solution was improved by 10%, comparing to the two-stage solution.
A stochastic lot-sizing and scheduling problem with demand uncertainty is studied in this paper. Lot-sizing determines the batch size for each product and scheduling decides the sequence of production. A multi-stage stochastic programming model is developed to minimize overall system costs including production cost, setup cost, inventory cost and backlog cost. We aim to find the optimal production sequence and resource allocation decisions. Demand uncertainty is represented by scenario trees using moment matching technique. Scenario reduction is used to select scenarios with the best representation of original set. A case study based on a manufacturing company has been conducted to illustrate and verify the model. We compared the two-stage stochastic programming model to the multi-stage stochastic programming model. The major motivation to adopt multi-stage stochastic programming models is that it extends the two-stage stochastic programming models by allowing revised decision at each period based on the previous realizations of uncertainty as well as decisions. Stability test and weak out-of-sample test are applied to find an appropriate scenario sample size. By using the multi-stage stochastic programming model, we improved the quality of solution by 10–13%.
•A lot-sizing and scheduling model is proposed to study the uncertainties.•Hybrid Stochastic and Robust Optimization approach has been adopted.•Sample Average Approximation technique is applied to ...solve the stochastic program.
Uncertainty is among the significant concerns in production scheduling. It has become increasingly important to take uncertainties into consideration for lot-sizing and scheduling. In this paper, we adopt the Hybrid Stochastic and Robust Optimization (HSRO) approach in lot-sizing and scheduling problems in which suppliers have the flexibility of satisfying a fraction of demand based on the market and their policies. Two types of uncertainties have been considered simultaneously: demand and overtime processing cost. Robust optimization is adopted for uncertain demand and Sample Average Approximation (SAA) technique is applied to solve the stochastic program for uncertain overtime processing cost. Numerical results based on a manufacturing company has been conducted to not only validate the proposed hybrid model but also quantitatively demonstrate the merit of our approach. Sample size stability test and sensitivity analyses on various parameters have also been conducted.
Traditionally, plant disease recognition has mainly been done visually by human. It is often biased, time-consuming, and laborious. Machine learning methods based on plant leave images have been ...proposed to improve the disease recognition process. Convolutional neural networks (CNNs) have been adopted and proven to be very effective. Despite the good classification accuracy achieved by CNNs, the issue of limited training data remains. In most cases, the training dataset is often small due to significant effort in data collection and annotation. In this case, CNN methods tend to have the overfitting problem. In this paper, Wasserstein generative adversarial network with gradient penalty (WGAN-GP) is combined with label smoothing regularization (LSR) to improve the prediction accuracy and address the overfitting problem under limited training data. Experiments show that the proposed WGAN-GP enhanced classification method can improve the overall classification accuracy of plant diseases by 24.4% as compared to 20.2% using classic data augmentation and 22% using synthetic samples without LSR.
The emergence of new technologies to synthesize and analyze big data with high-performance computing has increased our capacity to more accurately predict crop yields. Recent research has shown that ...machine learning (ML) can provide reasonable predictions faster and with higher flexibility compared to simulation crop modeling. However, a single machine learning model can be outperformed by a “committee” of models (machine learning ensembles) that can reduce prediction bias, variance, or both and is able to better capture the underlying distribution of the data. Yet, there are many aspects to be investigated with regard to prediction accuracy, time of the prediction, and scale. The earlier the prediction during the growing season the better, but this has not been thoroughly investigated as previous studies considered all data available to predict yields. This paper provides a machine leaning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa) considering complete and partial in-season weather knowledge. Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions. The forecasts are made in county-level scale and aggregated for agricultural district and state level scales. Results show that the proposed optimized weighted ensemble and the average ensemble are the most precise models with RRMSE of 9.5%. Stacked LASSO makes the least biased predictions (MBE of 53 kg/ha), while other ensemble models also outperformed the base learners in terms of bias. On the contrary, although random k-fold cross-validation is replaced by blocked sequential procedure, it is shown that stacked ensembles perform not as good as weighted ensemble models for time series data sets as they require the data to be non-IID to perform favorably. Comparing our proposed model forecasts with the literature demonstrates the acceptable performance of forecasts made by our proposed ensemble model. Results from the scenario of having partial in-season weather knowledge reveals that decent yield forecasts with RRMSE of 9.2% can be made as early as June 1
st
. Moreover, it was shown that the proposed model performed better than individual models and benchmark ensembles at agricultural district and state-level scales as well as county-level scale. To find the marginal effect of each input feature on the forecasts made by the proposed ensemble model, a methodology is suggested that is the basis for finding feature importance for the ensemble model. The findings suggest that weather features corresponding to weather in weeks 18–24 (May 1
st
to June 1
st
) are the most important input features.
Pre-growing season prediction of crop production outcomes such as grain yields and nitrogen (N) losses can provide insights to farmers and agronomists to make decisions. Simulation crop models can ...assist in scenario planning, but their use is limited because of data requirements and long runtimes. Thus, there is a need for more computationally expedient approaches to scale up predictions. We evaluated the potential of four machine learning (ML) algorithms (LASSO Regression, Ridge Regression, random forests, Extreme Gradient Boosting, and their ensembles) as meta-models for a cropping systems simulator (APSIM) to inform future decision support tool development. We asked: (1) How well do ML meta-models predict maize yield and N losses using pre-season information? (2) How many data are needed to train ML algorithms to achieve acceptable predictions? (3) Which input data variables are most important for accurate prediction? And (4) do ensembles of ML meta-models improve prediction? The simulated dataset included more than three million data including genotype, environment and management scenarios. XGBoost was the most accurate ML model in predicting yields with a relative mean square error (RRMSE) of 13.5%, and Random forests most accurately predicted N loss at planting time, with a RRMSE of 54%. ML meta-models reasonably reproduced simulated maize yields using the information available at planting, but not N loss. They also differed in their sensitivities to the size of the training dataset. Across all ML models, yield prediction error decreased by 10%-40% as the training dataset increased from 0.5 to 1.8 million data points, whereas N loss prediction error showed no consistent pattern. ML models also differed in their sensitivities to input variables (weather, soil properties, management, initial conditions), thus depending on the data availability researchers may use a different ML model. Modest prediction improvements resulted from ML ensembles. These results can help accelerate progress in coupling simulation models and ML toward developing dynamic decision support tools for pre-season management.
Aggregating multiple learners through an ensemble of models aim to make better predictions by capturing the underlying distribution of the data more accurately. Different ensembling methods, such as ...bagging, boosting, and stacking/blending, have been studied and adopted extensively in research and practice. While bagging and boosting focus more on reducing variance and bias, respectively, stacking approaches target both by finding the optimal way to combine base learners. In stacking with the weighted average, ensembles are created from weighted averages of multiple base learners. It is known that tuning hyperparameters of each base learner inside the ensemble weight optimization process can produce better performing ensembles. To this end, an optimization-based nested algorithm that considers tuning hyperparameters as well as finding the optimal weights to combine ensembles (Generalized Weighted Ensemble with Internally Tuned Hyperparameters (GEM-ITH)) is designed. Besides, Bayesian search was used to speed-up the optimizing process and a heuristic was implemented to generate diverse and well-performing base learners. The algorithm is shown to be generalizable to real data sets through analyses with ten publicly available data sets.
•We modeled and evaluated a fast pyrolysis integrated bio-oil gasification pathway.•Larger facility capacity is preferred based on Monte-Carlo simulations.•Fuel yield and biomass feedstock cost are ...the most important factors.
This paper evaluates the economic feasibility of an integrated production pathway combining fast pyrolysis and bio-oil gasification. The conversion process is simulated with Aspen Plus® for a 2000 metric ton per day facility. Techno-economic analysis of this integrated pathway has been conducted. A total capital investment of $510 million has been estimated and the minimum fuel selling price (MSP) is $5.59 per gallon of gasoline equivalent. The sensitivity analysis shows that the MSP is most sensitive to internal rate of return, fuel yield, biomass feedstock cost, and fixed capital investment. Monte-Carlo simulation shows that MSP for bio-oil gasification would be more than $6/gal with a probability of 0.24, which indicates this pathway is still at high risk with current economic and technical situation.
Renewable fuel is playing an increasingly important role as a substitute for fossil based energy. The US Department of Energy (DOE) has identified pyrolysis based platforms as promising biofuel ...production pathways. In this paper, we present a general biofuel supply chain model with a Mixed Integer Linear Programming (MILP) methodology to investigate the biofuel supply chain facility location, facility capacity at strategic levels, and biofuel production decisions at operational levels. In the model, we accommodate different biomass supplies and biofuel demands with biofuel supply shortage penalty and storage cost. The model is then applied to corn stover fast pyrolysis pathway with upgrading to hydrocarbon fuel since corn stover is the main feedstock for second generation biofuel production in the US Midwestern states. Numerical results illustrate unit cost for biofuel production, biomass, and biofuel allocation. The case study demonstrates the economic feasibility of producing biofuel from biomass at a commercial scale in Iowa.
•Supply chain design and operational planning are studied for biofuel production.•Mixed Integer Linear Programming methodology is utilized.•Facility location, capacity, and biofuel production decisions are analyzed.•Case study in Iowa demonstrates the applicability of the models.