论文标题

评估在协变量偏移下鲁棒性的预测时间批归归量

Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift

论文作者

Nado, Zachary, Padhy, Shreyas, Sculley, D., D'Amour, Alexander, Lakshminarayanan, Balaji, Snoek, Jasper

论文摘要

已显示协变量转移可急剧降低预测精度和深度学习模型的不确定性估计值的校准。这令人担忧,因为在各种现实世界部署环境中,协变量转移很普遍。但是,在本文中,我们注意到,在预测时间之前,有可能访问移位数据的小批次。这个有趣的观察结果使我们称之为一种简单但令人惊讶的有效方法,我们称之为预测时间批归一化,从而显着提高了协变量转移下的模型准确性和校准。使用此一行代码更改,我们可以在挑战性的Imagenet-C数据集上实现最新的协变量换档基准和60.28%的MCE;据我们所知,这是任何不纳入培训管道的其他数据增加或修改的模型的最佳结果。我们表明,预测时间批准归一化为改善鲁棒性(例如深层合奏)的现有最新方法提供了互补的好处,并结合了两者进一步提高了性能。我们的发现得到了对这种策略对各种数据集模式的严格消融跨模型行为的影响的详细测量的支持。但是,该方法与预训练一起使用时的结果不同,并且在更自然类型的数据集变化下似乎并没有很好地表现,因此值得进行额外的研究。我们包括指向数据中数据的链接,以提高可重复性,包括可以运行的Python笔记本,以轻松修改我们的分析,网址为https://colab.research.google.com/drive/11N0WDZNMQQULRRRRRRRRRWRWROUMDCRHSAIHSAIHKQJJOF。

Covariate shift has been shown to sharply degrade both predictive accuracy and the calibration of uncertainty estimates for deep learning models. This is worrying, because covariate shift is prevalent in a wide range of real world deployment settings. However, in this paper, we note that frequently there exists the potential to access small unlabeled batches of the shifted data just before prediction time. This interesting observation enables a simple but surprisingly effective method which we call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. Using this one line code change, we achieve state-of-the-art on recent covariate shift benchmarks and an mCE of 60.28\% on the challenging ImageNet-C dataset; to our knowledge, this is the best result for any model that does not incorporate additional data augmentation or modification of the training pipeline. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness (e.g. deep ensembles) and combining the two further improves performance. Our findings are supported by detailed measurements of the effect of this strategy on model behavior across rigorous ablations on various dataset modalities. However, the method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift, and is therefore worthy of additional study. We include links to the data in our figures to improve reproducibility, including a Python notebooks that can be run to easily modify our analysis at https://colab.research.google.com/drive/11N0wDZnMQQuLrRwRoumDCrhSaIhkqjof.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源