论文标题

主要意识的自适应蒸馏

Prime-Aware Adaptive Distillation

论文作者

Zhang, Youcai, Lan, Zhonghao, Dai, Yuchen, Zeng, Fangao, Bai, Yan, Chang, Jie, Wei, Yichen

论文摘要

知识蒸馏(KD)旨在通过模仿功能强大的教师网络的知识来提高学生网络的表现。现有方法着重于研究应传输哪些知识并在培训期间平均处理所有样本。本文将自适应样品加权引入KD。我们发现以前有效的硬采矿方法不适合蒸馏。此外,我们通过结合不确定性学习提出了原始感知的自适应蒸馏(PAD)。 PAD在蒸馏中感知了主要样品,然后自适应地强调它们的效果。 PAD与不平等培训的创新观点具有根本不同的不同,并将完善现有方法。因此,PAD具有通用性,并且已应用于各种任务,包括分类,度量学习和对象检测。 PAD在六个数据集中有十个教师的组合,可促进现有蒸馏方法的性能,并表现优于最新的最新方法。

Knowledge distillation(KD) aims to improve the performance of a student network by mimicing the knowledge from a powerful teacher network. Existing methods focus on studying what knowledge should be transferred and treat all samples equally during training. This paper introduces the adaptive sample weighting to KD. We discover that previous effective hard mining methods are not appropriate for distillation. Furthermore, we propose Prime-Aware Adaptive Distillation (PAD) by the incorporation of uncertainty learning. PAD perceives the prime samples in distillation and then emphasizes their effect adaptively. PAD is fundamentally different from and would refine existing methods with the innovative view of unequal training. For this reason, PAD is versatile and has been applied in various tasks including classification, metric learning, and object detection. With ten teacher-student combinations on six datasets, PAD promotes the performance of existing distillation methods and outperforms recent state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源