Damo-Yolo：实时对象检测设计的报告

论文标题

Damo-Yolo：实时对象检测设计的报告

DAMO-YOLO : A Report on Real-Time Object Detection Design

论文作者

Xu, Xianzhe, Jiang, Yiqi, Chen, Weihua, Huang, Yilun, Zhang, Yuan, Sun, Xiuyu

论文摘要

在本报告中，我们提出了一种称为Damo-Yolo的快速准确的对象检测方法，该方法的性能比最先进的Yolo系列更高。 Damo-Yolo从Yolo扩展了一些新技术，包括神经体系结构搜索（NAS），有效重新聚集的广义FPN（REPGFPN），一个带有Anignedota标签分配的轻量级头部和蒸馏。特别是，我们使用以最大熵原理为指导的Mae-Nas在低潜伏期和高性能的限制下搜索我们的检测主链，从而产生带有空间金字塔池和聚焦模块的Resnet/CSP样结构。在颈部和头部的设计中，我们遵循``大脖子，小头''的规则。我们用加速的女王融合进口概括的FPN来建立探测器的颈部，并使用有效的层聚合网络（Elan）和重新组化升级其CSPNET。然后，我们研究了检测器头大小如何影响检测性能，发现只有一个任务投影层的重颈会产生更好的结果。此外，提出了Alignedota来解决标签分配中的未对准问题。并引入了蒸馏架，以将性能提高到更高的水平。基于这些新技术，我们在各种规模上建立了一套模型，以满足不同方案的需求。对于一般行业的要求，我们提出了Damo-Yolo-T/S/M/L。他们可以在可可上获得43.6/47.7/50.2/51.9地图，而T4 GPU的潜伏期分别为2.78/3.83/5.62/7.95毫秒。此外，对于具有有限计算能力的边缘设备，我们还提出了Damo-Yolo-NS/NM/NL轻型型号。他们可以在可可上获得32.3/38.2/40.5地图，X86-CPU上的潜伏期为4.08/5.05/6.69毫秒。我们提出的一般和轻量级模型在各自的应用程序方面都优于其他Yolo系列模型。

In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series. DAMO-YOLO is extended from YOLO with some new technologies, including Neural Architecture Search (NAS), efficient Reparameterized Generalized-FPN (RepGFPN), a lightweight head with AlignedOTA label assignment, and distillation enhancement. In particular, we use MAE-NAS, a method guided by the principle of maximum entropy, to search our detection backbone under the constraints of low latency and high performance, producing ResNet/CSP-like structures with spatial pyramid pooling and focus modules. In the design of necks and heads, we follow the rule of ``large neck, small head''.We import Generalized-FPN with accelerated queen-fusion to build the detector neck and upgrade its CSPNet with efficient layer aggregation networks (ELAN) and reparameterization. Then we investigate how detector head size affects detection performance and find that a heavy neck with only one task projection layer would yield better results.In addition, AlignedOTA is proposed to solve the misalignment problem in label assignment. And a distillation schema is introduced to improve performance to a higher level. Based on these new techs, we build a suite of models at various scales to meet the needs of different scenarios. For general industry requirements, we propose DAMO-YOLO-T/S/M/L. They can achieve 43.6/47.7/50.2/51.9 mAPs on COCO with the latency of 2.78/3.83/5.62/7.95 ms on T4 GPUs respectively. Additionally, for edge devices with limited computing power, we have also proposed DAMO-YOLO-Ns/Nm/Nl lightweight models. They can achieve 32.3/38.2/40.5 mAPs on COCO with the latency of 4.08/5.05/6.69 ms on X86-CPU. Our proposed general and lightweight models have outperformed other YOLO series models in their respective application scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题