从外围输入数据中改善非线性回归模型的预测

论文标题

从外围输入数据中改善非线性回归模型的预测

Improving predictions by nonlinear regression models from outlying input data

论文作者

Hsieh, William W.

论文摘要

当将机器学习/统计方法应用于环境科学时，非线性回归（NLR）模型通常比线性回归（LR）的表现稍微好一些，偶尔会差。此难题的建议原因是，当给定输入数据以外的输入数据时，NLR模型可以给出的预测比LR差得多。连续无限的变量被广泛用于环境科学中，新输入数据不足以远远超出训练域。对于六个环境数据集，测试数据中的输入根据与培训输入数据的Mahalanobis距离分类为“异常值”和“非外观”。预测分数（平均绝对误差，Spearman相关性）显示出NLR的表现优于非外观的LR，但对于异常值来说通常表现不佳。提出了一种基于OCCAM剃须刀（OR）的方法，其中使用线性外推代替异常值的非线性外推。对离群域的线性外推是基于非外部域内的NLR模型。此NLR $ _ {\ MATHRM {或}} $方法减少了NLR的外推很差的出现，并且它倾向于在异常值中优于NLR和LR。总之，应筛选输入测试数据以获取异常值。对于离群值，不可靠的NLR预测可以用NLR $ _ {\ Mathrm {或}} $或LR预测代替，或者通过发出“无可靠的预测”警告。

When applying machine learning/statistical methods to the environmental sciences, nonlinear regression (NLR) models often perform only slightly better and occasionally worse than linear regression (LR). The proposed reason for this conundrum is that NLR models can give predictions much worse than LR when given input data which lie outside the domain used in model training. Continuous unbounded variables are widely used in environmental sciences, whence not uncommon for new input data to lie far outside the training domain. For six environmental datasets, inputs in the test data were classified as "outliers" and "non-outliers" based on the Mahalanobis distance from the training input data. The prediction scores (mean absolute error, Spearman correlation) showed NLR to outperform LR for the non-outliers, but often underperform LR for the outliers. An approach based on Occam's Razor (OR) was proposed, where linear extrapolation was used instead of nonlinear extrapolation for the outliers. The linear extrapolation to the outlier domain was based on the NLR model within the non-outlier domain. This NLR$_{\mathrm{OR}}$ approach reduced occurrences of very poor extrapolation by NLR, and it tended to outperform NLR and LR for the outliers. In conclusion, input test data should be screened for outliers. For outliers, the unreliable NLR predictions can be replaced by NLR$_{\mathrm{OR}}$ or LR predictions, or by issuing a "no reliable prediction" warning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题