通过机器学习合奏的预测玉米产量

论文标题

通过机器学习合奏的预测玉米产量

Forecasting Corn Yield with Machine Learning Ensembles

论文作者

Shahhosseini, Mohsen, Hu, Guiping, Archontoulis, Sotirios V.

论文摘要

通过高性能计算合成和分析大数据的新技术的出现，提高了我们更准确地预测作物产量的能力。最近的研究表明，与仿真作物建模相比，机器学习（ML）可以提供合理的预测，更快且具有更高的灵活性。在生长季节期间的预测越早，但尚未对此进行彻底的研究，因为先前的研究考虑了所有可预测收率的数据。本文提供了一个基于机器学习的框架，以预测美国三个玉米带州（伊利诺伊州，印第安纳州和爱荷华州）的玉米产量，考虑到完整而部分的季节内天气知识。使用阻止的顺序过程设计了几种集合模型，以生成止股外预测。预测以县级规模进行，并针对农业区和州级的量表进行了汇总。结果表明，基于基础学习者的加权平均值的合奏模型优于单个模型。具体而言，与其他开发的模型相比，提出的集合模型可以实现最佳预测准确性（RRMSE为7.8％）和最小平均偏置误差（-6.06 BU/ACRE）。将我们提出的模型预测与文献进行比较，证明了我们提出的整体模型对预测的优越性。从具有部分季节天气知识的情况的结果表明，最早可以在6月1日进行预测。为了找到每个输入特征对拟议集成模型做出的预测的边际效应，建议方法是找到集合模型的特征重要性的基础。研究结果表明，在第18-24周（5月1日至6月1日）中与天气相对应的天气特征是最重要的输入功能。

The emerge of new technologies to synthesize and analyze big data with high-performance computing, has increased our capacity to more accurately predict crop yields. Recent research has shown that Machine learning (ML) can provide reasonable predictions, faster, and with higher flexibility compared to simulation crop modeling. The earlier the prediction during the growing season the better, but this has not been thoroughly investigated as previous studies considered all data available to predict yields. This paper provides a machine learning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa) considering complete and partial in-season weather knowledge. Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions. The forecasts are made in county-level scale and aggregated for agricultural district, and state level scales. Results show that ensemble models based on weighted average of the base learners outperform individual models. Specifically, the proposed ensemble model could achieve best prediction accuracy (RRMSE of 7.8%) and least mean bias error (-6.06 bu/acre) compared to other developed models. Comparing our proposed model forecasts with the literature demonstrates the superiority of forecasts made by our proposed ensemble model. Results from the scenario of having partial in-season weather knowledge reveal that decent yield forecasts can be made as early as June 1st. To find the marginal effect of each input feature on the forecasts made by the proposed ensemble model, a methodology is suggested that is the basis for finding feature importance for the ensemble model. The findings suggest that weather features corresponding to weather in weeks 18-24 (May 1st to June 1st) are the most important input features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题