通过自适应重量衰减提高鲁棒性

论文标题

通过自适应重量衰减提高鲁棒性

Improving Robustness with Adaptive Weight Decay

论文作者

Ghiasi, Amin, Shafahi, Ali, Ardekani, Reza

论文摘要

我们提出了自适应重量衰减，该衰减会在每次训练期间自动调节体重衰减的高参数。对于分类问题，我们建议根据分类损失（即跨凝胶的梯度）和正则化损失（即$ \ ell_2 $ - 权重）的更新强度，更改重量衰减超参数的价值。我们表明，这种简单的修改可能会导致对抗性鲁棒性的大大改善 - 遭受鲁棒过度拟合的区域 - 而无需在各种数据集和体系结构选择中需要额外的数据。例如，我们的重新制作导致CIFAR-100的相对鲁棒性提高$ 20 \％$，与CIFAR-10 $ 10 \％$相对鲁棒性改善相比，与传统重量衰减的最佳调谐超参数相比，导致模型与SOTA鲁棒性方法具有可比性的模型。此外，该方法具有其他理想的特性，例如对学习率的敏感性较小，而重量规范较小，后者有助于鲁棒性过度适合标签噪声和修剪。

We propose adaptive weight decay, which automatically tunes the hyper-parameter for weight decay during each training iteration. For classification problems, we propose changing the value of the weight decay hyper-parameter on the fly based on the strength of updates from the classification loss (i.e., gradient of cross-entropy), and the regularization loss (i.e., $\ell_2$-norm of the weights). We show that this simple modification can result in large improvements in adversarial robustness -- an area which suffers from robust overfitting -- without requiring extra data across various datasets and architecture choices. For example, our reformulation results in $20\%$ relative robustness improvement for CIFAR-100, and $10\%$ relative robustness improvement on CIFAR-10 comparing to the best tuned hyper-parameters of traditional weight decay resulting in models that have comparable performance to SOTA robustness methods. In addition, this method has other desirable properties, such as less sensitivity to learning rate, and smaller weight norms, which the latter contributes to robustness to overfitting to label noise, and pruning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题