论文标题

知识合并与变压器的对象检测

Knowledge Amalgamation for Object Detection with Transformers

论文作者

Zhang, Haofei, Mao, Feng, Xue, Mengqi, Fang, Gongfan, Feng, Zunlei, Song, Jie, Song, Mingli

论文摘要

知识融合(KA)是一个新颖的深层模型,重复使用,旨在将知识从几位训练有素的教师转移到多才多艺且紧凑的学生。目前,这些方法中的大多数是针对卷积神经网络(CNN)量身定制的。但是,有一种趋势是,具有完全不同架构的变压器开始挑战许多计算机视觉任务中CNN的统治。然而,直接将先前的KA方法应用于变压器会导致严重的性能降解。在这项工作中,我们探索了一种更有效的KA方案,用于基于变压器的对象检测模型。具体而言,考虑到变压器的结构特征,我们建议将KA溶解为两个方面:序列级合并(SA)和任务级别的合并(TA)。特别是,通过串联教师序列,而不是将它们汇总到固定尺寸的序列中,因为在序列级别的融合中生成了提示,作为以前的ka作品。此外,学生通过在任务级合并效率的软目标中学习异质检测任务。关于Pascal VOC和可可的广泛实验表明,序列级别的合并显着提高了学生的表现,而以前的方法会损害学生。此外,基于变形金刚的学生在学习合并知识方面表现出色,因为他们迅速掌握了异质检测任务,并在专业方面的教师中取得了优越或至少与教师的表现相当。

Knowledge amalgamation (KA) is a novel deep model reusing task aiming to transfer knowledge from several well-trained teachers to a multi-talented and compact student. Currently, most of these approaches are tailored for convolutional neural networks (CNNs). However, there is a tendency that transformers, with a completely different architecture, are starting to challenge the domination of CNNs in many computer vision tasks. Nevertheless, directly applying the previous KA methods to transformers leads to severe performance degradation. In this work, we explore a more effective KA scheme for transformer-based object detection models. Specifically, considering the architecture characteristics of transformers, we propose to dissolve the KA into two aspects: sequence-level amalgamation (SA) and task-level amalgamation (TA). In particular, a hint is generated within the sequence-level amalgamation by concatenating teacher sequences instead of redundantly aggregating them to a fixed-size one as previous KA works. Besides, the student learns heterogeneous detection tasks through soft targets with efficiency in the task-level amalgamation. Extensive experiments on PASCAL VOC and COCO have unfolded that the sequence-level amalgamation significantly boosts the performance of students, while the previous methods impair the students. Moreover, the transformer-based students excel in learning amalgamated knowledge, as they have mastered heterogeneous detection tasks rapidly and achieved superior or at least comparable performance to those of the teachers in their specializations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源