稳定的预测模型错误指定和不可知的分布变化

论文标题

稳定的预测模型错误指定和不可知的分布变化

Stable Prediction with Model Misspecification and Agnostic Distribution Shift

论文作者

Kuang, Kun, Xiong, Ruoxuan, Cui, Peng, Athey, Susan, Li, Bo

论文摘要

对于许多机器学习算法，需要两个主要假设来保证性能。一个是测试数据是从与培训数据相同的分布中绘制的，另一个是正确指定了模型。但是，在实际应用中，我们通常对测试数据和基础真实模型几乎没有任何知识。在模型错误指定下，训练和测试数据之间的不可知性分布变化导致参数估计的准确性和跨未知测试数据的预测不稳定性。为了解决这些问题，我们提出了一种新型的反相关的加权回归（DWR）算法，该算法共同优化了可变的去相关正常化程序和加权回归模型。变量去相关的正常化程序估计每个样品的重量，以使变量在加权训练数据上是脱离相关的。然后，这些权重在加权回归中使用，以提高估计对每个变量效果的准确性，从而有助于提高跨未知测试数据的预测稳定性。广泛的实验清楚地表明，我们的DWR算法可以显着提高参数估计的准确性和预测的稳定性，并通过模型错误指定和不可知的分布转移。

For many machine learning algorithms, two main assumptions are required to guarantee performance. One is that the test data are drawn from the same distribution as the training data, and the other is that the model is correctly specified. In real applications, however, we often have little prior knowledge on the test data and on the underlying true model. Under model misspecification, agnostic distribution shift between training and test data leads to inaccuracy of parameter estimation and instability of prediction across unknown test data. To address these problems, we propose a novel Decorrelated Weighting Regression (DWR) algorithm which jointly optimizes a variable decorrelation regularizer and a weighted regression model. The variable decorrelation regularizer estimates a weight for each sample such that variables are decorrelated on the weighted training data. Then, these weights are used in the weighted regression to improve the accuracy of estimation on the effect of each variable, thus help to improve the stability of prediction across unknown test data. Extensive experiments clearly demonstrate that our DWR algorithm can significantly improve the accuracy of parameter estimation and stability of prediction with model misspecification and agnostic distribution shift.

下载PDF全文

下载文献需遵守相关版权规定

论文标题