使用错误的学习模型时预测性推断中的体内平衡现象：将数据随机拆分为培训和测试集的故事

论文标题

使用错误的学习模型时预测性推断中的体内平衡现象：将数据随机拆分为培训和测试集的故事

Homeostasis phenomenon in predictive inference when using a wrong learning model: a tale of random split of data into training and test sets

论文作者

Xie, Min-ge, Zheng, Zheshi

论文摘要

本说明使用保形预测程序，在Efron教授（Efron，2020）讨论的有关预测，估计和IID假设的几个点上提供进一步的支持。它的目的是传达以下消息：（1）在IID（例如，随机分组培训和测试数据集）假设下，预测确实比估计更容易，因为在这种情况下，预测具有“体内稳态属性” - 即使用于学习的模型完全错误，预测结果仍保持有效。（2）如果违反了IID假设（例如，针对特定个体的有针对性预测），则体内平衡属性通常会被中断，并且在错误模型下的预测结果通常无效。（3）更好的模型估计通常会导致在IID和非IID情况下更准确的预测。良好的建模和估计实践很重要，而且在很多时候对于获得良好的预测结果至关重要。讨论还提供了一个解释，为什么深度学习方法在学术练习中如此有效（通过将整个数据随机将整个数据分为培训和测试数据集设置为实验），但未能在现实世界应用中提供许多“杀手应用程序”。

This note uses a conformal prediction procedure to provide further support on several points discussed by Professor Efron (Efron, 2020) concerning prediction, estimation and IID assumption. It aims to convey the following messages: (1) Under the IID (e.g., random split of training and testing data sets) assumption, prediction is indeed an easier task than estimation, since prediction has a 'homeostasis property' in this case -- Even if the model used for learning is completely wrong, the prediction results maintain valid. (2) If the IID assumption is violated (e.g., a targeted prediction on specific individuals), the homeostasis property is often disrupted and the prediction results under a wrong model are usually invalid. (3) Better model estimation typically leads to more accurate prediction in both IID and non-IID cases. Good modeling and estimation practices are important and, in many times, crucial for obtaining good prediction results. The discussion also provides one explanation why the deep learning method works so well in academic exercises (with experiments set up by randomly splitting the entire data into training and testing data sets), but fails to deliver many `killer applications' in real world applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题