知识蒸馏的角色数据增强

论文标题

知识蒸馏的角色数据增强

Role-Wise Data Augmentation for Knowledge Distillation

论文作者

Fu, Jie, Geng, Xue, Duan, Zhijian, Zhuang, Bohan, Yuan, Xingdi, Trischler, Adam, Lin, Jie, Pal, Chris, Dong, Hao

论文摘要

知识蒸馏（KD）是一种通用方法，可以转移一个机器学习模型（\ textIt {cocuts}）学习的``知识''到另一个模型（\ textit {student {student {student}）中，通常，教师具有更大的容量（例如，更多的参数或更高的参数或更高的比分差）。据我们所知，现有方法忽略了以下事实：尽管学生吸收了老师的额外知识，但两种模型都共享相同的输入数据 - 这些数据是唯一可以证明教师知识的媒介。由于模型能力的差异，学生可能不会从培训教师的相同数据点中完全受益。另一方面，人类老师可能会以个性化的例子来展示一个知识，例如，就其文化背景和兴趣而言。受这种行为的启发，我们设计了具有不同作用的数据增强剂，以促进知识蒸馏。我们的数据增强代理分别为教师和学生生成不同的培训数据。从经验上，我们发现特别量身定制的数据点使教师的知识能够更有效地向学生展示。我们将我们的方法与现有的KD方法进行了比较，以培训流行的神经体系结构，并证明角色扩大可以提高KD对强大先验方法的有效性。可以在https://github.com/bigaidream-projects/role-kd上找到复制我们结果的代码

Knowledge Distillation (KD) is a common method for transferring the ``knowledge'' learned by one machine learning model (the \textit{teacher}) into another model (the \textit{student}), where typically, the teacher has a greater capacity (e.g., more parameters or higher bit-widths). To our knowledge, existing methods overlook the fact that although the student absorbs extra knowledge from the teacher, both models share the same input data -- and this data is the only medium by which the teacher's knowledge can be demonstrated. Due to the difference in model capacities, the student may not benefit fully from the same data points on which the teacher is trained. On the other hand, a human teacher may demonstrate a piece of knowledge with individualized examples adapted to a particular student, for instance, in terms of her cultural background and interests. Inspired by this behavior, we design data augmentation agents with distinct roles to facilitate knowledge distillation. Our data augmentation agents generate distinct training data for the teacher and student, respectively. We find empirically that specially tailored data points enable the teacher's knowledge to be demonstrated more effectively to the student. We compare our approach with existing KD methods on training popular neural architectures and demonstrate that role-wise data augmentation improves the effectiveness of KD over strong prior approaches. The code for reproducing our results can be found at https://github.com/bigaidream-projects/role-kd

下载PDF全文

下载文献需遵守相关版权规定

论文标题