预测引导的蒸馏以检测密集的对象

论文标题

预测引导的蒸馏以检测密集的对象

Prediction-Guided Distillation for Dense Object Detection

论文作者

Yang, Chenhongyi, Ochal, Mateusz, Storkey, Amos, Crowley, Elliot J.

论文摘要

现实世界中的对象检测模型应便宜且准确。知识蒸馏（KD）可以通过利用大型教师模型的有用信息来提高小型，廉价检测模型的准确性。但是，一个关键的挑战是确定老师为蒸馏产生的最有用的功能。在这项工作中，我们表明，在地面界限内只有很小一小部分的功能才是老师的高检测性能。基于这一点，我们提出了预测引导的蒸馏（PGD），该蒸馏将蒸馏放在教师的这些关键预测区域上，并在许多现有的KD基准方面的绩效提高。此外，我们提出了一个自适应加权方案，以平滑其影响力并取得更好的性能。我们提出的方法在各种先进的一阶段检测体系中的当前最新KD基线的表现优于当前最新的KD基线。具体而言，在可可数据集上，我们的方法分别用RESNET-101和RESNET-50作为教师和学生骨架，在 +3.1％和 +4.6％的AP改进之间达到了AP的改善。在CrowdHuman数据集上，我们也使用这些骨架，在MR和AP方面取得了 +3.2％和 +2.0％的提高。我们的代码可在https://github.com/chenhongyiyang/pgd上找到。

Real-world object detection models should be cheap and accurate. Knowledge distillation (KD) can boost the accuracy of a small, cheap detection model by leveraging useful information from a larger teacher model. However, a key challenge is identifying the most informative features produced by the teacher for distillation. In this work, we show that only a very small fraction of features within a ground-truth bounding box are responsible for a teacher's high detection performance. Based on this, we propose Prediction-Guided Distillation (PGD), which focuses distillation on these key predictive regions of the teacher and yields considerable gains in performance over many existing KD baselines. In addition, we propose an adaptive weighting scheme over the key regions to smooth out their influence and achieve even better performance. Our proposed approach outperforms current state-of-the-art KD baselines on a variety of advanced one-stage detection architectures. Specifically, on the COCO dataset, our method achieves between +3.1% and +4.6% AP improvement using ResNet-101 and ResNet-50 as the teacher and student backbones, respectively. On the CrowdHuman dataset, we achieve +3.2% and +2.0% improvements in MR and AP, also using these backbones. Our code is available at https://github.com/ChenhongyiYang/PGD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题