关于神经网络模型的多功能知识蒸馏范式的选择性调查

论文标题

关于神经网络模型的多功能知识蒸馏范式的选择性调查

A Selective Survey on Versatile Knowledge Distillation Paradigm for Neural Network Models

论文作者

Ku, Jeong-Hoe, Oh, JiHun, Lee, YoungYoon, Pooniwala, Gaurav, Lee, SangJeong

论文摘要

本文旨在为研究人员和从业人员提供有关知识蒸馏（KD）框架的选择性调查，以利用它在深层神经网络领域开发新的优化模型。为此，我们简要概述了知识蒸馏以及一些相关的作品，包括使用特权信息（LUPI）和广义蒸馏（GD）学习。尽管最初将基于教师架构的知识蒸馏设计为模型压缩技术，但它在各种框架上都发现了多功能应用程序。在本文中，我们回顾了知识蒸馏的特征，从以下假设：知识蒸馏的三种重要成分是蒸馏的知识和损失，教师学生范式和蒸馏过程。此外，我们通过研究其直接应用及其用法与其他深度学习范式的使用来调查知识蒸馏的多功能性。最后，我们介绍了知识蒸馏中的一些未来作品，包括可解释的知识蒸馏，研究了对绩效增长的分析分析以及自我监督的学习，这是深度学习社区中的热门研究主题。

This paper aims to provide a selective survey about knowledge distillation(KD) framework for researchers and practitioners to take advantage of it for developing new optimized models in the deep neural network field. To this end, we give a brief overview of knowledge distillation and some related works including learning using privileged information(LUPI) and generalized distillation(GD). Even though knowledge distillation based on the teacher-student architecture was initially devised as a model compression technique, it has found versatile applications over various frameworks. In this paper, we review the characteristics of knowledge distillation from the hypothesis that the three important ingredients of knowledge distillation are distilled knowledge and loss,teacher-student paradigm, and the distillation process. In addition, we survey the versatility of the knowledge distillation by studying its direct applications and its usage in combination with other deep learning paradigms. Finally we present some future works in knowledge distillation including explainable knowledge distillation where the analytical analysis of the performance gain is studied and the self-supervised learning which is a hot research topic in deep learning community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题