检测中心：通过在语言嵌入中查询适应对象检测数据集

论文标题

检测中心：通过在语言嵌入中查询适应对象检测数据集

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding

论文作者

Meng, Lingchen, Dai, Xiyang, Chen, Yinpeng, Zhang, Pengchuan, Chen, Dongdong, Liu, Mengchen, Wang, Jianfeng, Wu, Zuxuan, Yuan, Lu, Jiang, Yu-Gang

论文摘要

组合多个数据集可以在许多计算机视觉任务上提高性能。但是，由于检测数据集之间存在两个不一致的情况，当对象检测中，在对象检测中尚未见证类似的趋势：分类学差异和域间隙。在本文中，我们通过一个新设计（命名为检测中心）来解决这些挑战，该设计是数据集意识和类别对准的。它不仅减轻了数据集的不一致，而且还为检测器提供了一致的指导，可以在多个数据集中学习。特别是，通过学习用于调整对象查询的数据集嵌入以及检测头中的卷积内核来实现数据集意识设计。通过用单词嵌入并利用语言嵌入的语义连贯性，将数据集的类别通过语义对齐为统一空间。检测中心可满足对象检测的大数据的好处。实验表明，在多个数据集上的联合培训仅在每个数据集中就可以实现大量的性能提高。检测中心进一步在具有各种数据集的UODB基准上实现了SOTA性能。

Combining multiple datasets enables performance boost on many computer vision tasks. But similar trend has not been witnessed in object detection when combining multiple datasets due to two inconsistencies among detection datasets: taxonomy difference and domain gap. In this paper, we address these challenges by a new design (named Detection Hub) that is dataset-aware and category-aligned. It not only mitigates the dataset inconsistency but also provides coherent guidance for the detector to learn across multiple datasets. In particular, the dataset-aware design is achieved by learning a dataset embedding that is used to adapt object queries as well as convolutional kernels in detection heads. The categories across datasets are semantically aligned into a unified space by replacing one-hot category representations with word embedding and leveraging the semantic coherence of language embedding. Detection Hub fulfills the benefits of large data on object detection. Experiments demonstrate that joint training on multiple datasets achieves significant performance gains over training on each dataset alone. Detection Hub further achieves SoTA performance on UODB benchmark with wide variety of datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题