广义局灶性损失：学习合格和分布式界限框以进行密集的对象检测

论文标题

广义局灶性损失：学习合格和分布式界限框以进行密集的对象检测

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

论文作者

Li, Xiang, Wang, Wenhai, Wu, Lijun, Chen, Shuo, Hu, Xiaolin, Li, Jun, Tang, Jinhui, Yang, Jian

论文摘要

一阶段检测器基本上将对象检测作为密集分类和定位。分类通常是通过焦点损失优化的，并且在Dirac Delta分布下通常会学习盒子位置。一阶段探测器的最新趋势是引入单个预测分支以估计本地化的质量，预测的质量促进了分类以提高检测性能。本文深入研究了以上三个基本要素的表示：质量估计，分类和本地化。在现有实践中发现了两个问题，包括（1）训练和推理之间质量估计和分类的不一致以及（2）当复杂场景中存在歧义和不确定性时，僵化的DIRAC DERAC DERAC DELTA分布。为了解决这些问题，我们为这些元素设计新表示。具体而言，我们将质量估计合并到类预测向量以形成本地化质量和分类的联合表示，并使用向量表示盒子位置的任意分布。改进的表示形式消除了不一致的风险，并准确地描述了实际数据中的灵活分布，但包含连续标签，这超出了焦点损失的范围。然后，我们提出了广泛的焦点损失（GFL），将焦点损失从离散形式概括为连续版本以成功优化。在可可测试-DEV上，GFL使用RESNET-101主链实现45.0 \％AP，超过了最先进的SAPD（43.5 \％）和ATSS（43.6 \％），具有较高或可比的推理速度，在同一骨干和训练设置下。值得注意的是，我们的最佳模型可以在单个2080TI GPU上以10 fps的形式获得48.2 \％的单模单尺度AP。代码和型号可在https://github.com/implus/gfocal上找到。

One-stage detector basically formulates object detection as dense classification and localization. The classification is usually optimized by Focal Loss and the box location is commonly learned under Dirac delta distribution. A recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality of localization, where the predicted quality facilitates the classification to improve detection performance. This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization. Two problems are discovered in existing practices, including (1) the inconsistent usage of the quality estimation and classification between training and inference and (2) the inflexible Dirac delta distribution for localization when there is ambiguity and uncertainty in complex scenes. To address the problems, we design new representations for these elements. Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations. The improved representations eliminate the inconsistency risk and accurately depict the flexible distribution in real data, but contain continuous labels, which is beyond the scope of Focal Loss. We then propose Generalized Focal Loss (GFL) that generalizes Focal Loss from its discrete form to the continuous version for successful optimization. On COCO test-dev, GFL achieves 45.0\% AP using ResNet-101 backbone, surpassing state-of-the-art SAPD (43.5\%) and ATSS (43.6\%) with higher or comparable inference speed, under the same backbone and training settings. Notably, our best model can achieve a single-model single-scale AP of 48.2\%, at 10 FPS on a single 2080Ti GPU. Code and models are available at https://github.com/implus/GFocal.

下载PDF全文

下载文献需遵守相关版权规定

论文标题