论文标题
可靠的人类对象互动检测的多义解密网络
Polysemy Deciphering Network for Robust Human-Object Interaction Detection
论文作者
论文摘要
人类对象相互作用(HOI)检测对于以人为中心的场景理解任务很重要。现有作品倾向于假设相同的动词在不同的HOI类别中具有相似的视觉特征,这种方法忽略了动词的各种语义含义。为了解决这个问题,在本文中,我们提出了一个新型的多义解密网络(PD-NET),该网络将动词的视觉多义以三种不同的方式解码。首先,我们通过使用两个新型模块来完善HOI检测的特征,即在PolySemyAware中成为Polysemyaware:即语言先验引导的频道注意(LPCA)和基于语言的基于语言的功能增强(LPFA)。 LPCA突出了每个HOI类别的人类和对象外观特征中的重要元素;此外,LPFA增强了使用语言先验的HOI检测的人体姿势和空间特征,从而使动词分类器能够接收语言提示,从而减少同一动词的类内部变化。其次,我们介绍了一种新颖的多义观念模态融合模块(PAMF),该模量(PAMF)指导PD-NET根据语言先验的特征类型做出决策,根据语言先验更为重要。第三,我们建议通过为语义上类似的HOI类别共享动词分类器来缓解动词多义问题。此外,为了加快动词多义问题的研究,我们构建了一个名为HOI-Verbpolysemy(HOIVP)的新基准数据集,其中包括在现实世界中具有多种语义含义的常见动词(谓词)。最后,通过解密动词的视觉多义,我们的方法证明了在hico-det,v-coco和hoi-vp数据库上的显着边距以优于最先进的方法。本文中的代码和数据可在https://github.com/muchhair/pd-net上找到。
Human-Object Interaction (HOI) detection is important to human-centric scene understanding tasks. Existing works tend to assume that the same verb has similar visual characteristics in different HOI categories, an approach that ignores the diverse semantic meanings of the verb. To address this issue, in this paper, we propose a novel Polysemy Deciphering Network (PD-Net) that decodes the visual polysemy of verbs for HOI detection in three distinct ways. First, we refine features for HOI detection to be polysemyaware through the use of two novel modules: namely, Language Prior-guided Channel Attention (LPCA) and Language Prior-based Feature Augmentation (LPFA). LPCA highlights important elements in human and object appearance features for each HOI category to be identified; moreover, LPFA augments human pose and spatial features for HOI detection using language priors, enabling the verb classifiers to receive language hints that reduce intra-class variation for the same verb. Second, we introduce a novel Polysemy-Aware Modal Fusion module (PAMF), which guides PD-Net to make decisions based on feature types deemed more important according to the language priors. Third, we propose to relieve the verb polysemy problem through sharing verb classifiers for semantically similar HOI categories. Furthermore, to expedite research on the verb polysemy problem, we build a new benchmark dataset named HOI-VerbPolysemy (HOIVP), which includes common verbs (predicates) that have diverse semantic meanings in the real world. Finally, through deciphering the visual polysemy of verbs, our approach is demonstrated to outperform state-of-the-art methods by significant margins on the HICO-DET, V-COCO, and HOI-VP databases. Code and data in this paper are available at https://github.com/MuchHair/PD-Net.