AI-KD：自我知识蒸馏的对抗性学习和隐性正则化

论文标题

AI-KD：自我知识蒸馏的对抗性学习和隐性正则化

AI-KD: Adversarial learning and Implicit regularization for self-Knowledge Distillation

论文作者

Kim, Hyungmin, Suh, Sungho, Baek, Sunghyun, Kim, Daehwan, Jeong, Daun, Cho, Hansang, Kim, Junmo

论文摘要

我们提出了一种新颖的对抗性惩罚的自我知识蒸馏方法，称为对抗性学习和自我知识蒸馏（AI-KD）的隐性正则化，该方法通过对抗性学习和隐性蒸馏来规范训练程序。我们的模型不仅提炼了预先训练和以前的时期预测概率的确定性和渐进知识，而且还可以使用对抗性学习转移确定性预测分布的知识。动机是自我知识蒸馏方法将预测概率与软目标定向，但确切的分布可能很难预测。我们的方法部署了一个歧视者，以区分预训练和学生模型之间的分布，同时训练学生模型以在训练有素的程序中欺骗歧视者。因此，学生模型不仅可以学习预训练的模型的预测概率，而且还可以使预培训和学生模型之间的分布保持一致。我们证明了该方法在多个数据集上使用网络体系结构的有效性，并证明所提出的方法比最新方法更好地表现了性能。

We present a novel adversarial penalized self-knowledge distillation method, named adversarial learning and implicit regularization for self-knowledge distillation (AI-KD), which regularizes the training procedure by adversarial learning and implicit distillations. Our model not only distills the deterministic and progressive knowledge which are from the pre-trained and previous epoch predictive probabilities but also transfers the knowledge of the deterministic predictive distributions using adversarial learning. The motivation is that the self-knowledge distillation methods regularize the predictive probabilities with soft targets, but the exact distributions may be hard to predict. Our method deploys a discriminator to distinguish the distributions between the pre-trained and student models while the student model is trained to fool the discriminator in the trained procedure. Thus, the student model not only can learn the pre-trained model's predictive probabilities but also align the distributions between the pre-trained and student models. We demonstrate the effectiveness of the proposed method with network architectures on multiple datasets and show the proposed method achieves better performance than state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题