ABN：临时行动提案生成的代理人感知边界网络

论文标题

ABN：临时行动提案生成的代理人感知边界网络

ABN: Agent-Aware Boundary Networks for Temporal Action Proposal Generation

论文作者

Vo, Khoa, Yamazaki, Kashu, Truong, Sang, Tran, Minh-Triet, Sugimoto, Akihiro, Le, Ngan

论文摘要

时间行动提案（TAPG）旨在估计未修剪视频中动作的时间间隔，这是一个挑战性的，但在视频分析和理解的许多任务中起着重要的作用。尽管TAPG取得了巨大的成就，但大多数现有作品都通过将深度学习模型作为一个黑色盒子作为一个未修剪的视频来提取视频视频表示形式，而忽略了代理与周围环境之间互动的感知。因此，如果我们可以捕获代理与环境之间的这些相互作用，它是有益的，并有可能提高TAPG的性能。在本文中，我们提出了一个名为Agent-Aent Again-Abs-Abn网络（ABN）的新型框架，该框架由两个子网络（i）组成一个代理意识的表示网络，以获得视频表示中的代理 - 代理和代理 - 环境关系，以及（ii）边界生成网络以估算时间间隔的置信度分数。在代理感知的表示网络中，代理之间的相互作用是通过局部途径表达的，该途径在局部水平上运行以关注代理的运动，而周围环境的整体感知是通过全球途径表达的，该途径通过全球途径表达，该途径在全球范围内运行，以感知代理环境的影响。对具有不同骨干网络的20个行动Thumos-14和200-Action NationNet-1.3数据集的全面评估（即C3D，Slowfast和两流）表明，无论您在TAPG上使用的backbone网络，我们所提出的ABN均超过了最先进的方法。我们通过利用我们的方法在时间动作检测（TAD）框架上产生的建议并评估其检测性能，进一步研究了提案质量。可以在此URL https://github.com/vhvkhoa/tapg-agentenvnetwork.git中找到源代码。

Temporal action proposal generation (TAPG) aims to estimate temporal intervals of actions in untrimmed videos, which is a challenging yet plays an important role in many tasks of video analysis and understanding. Despite the great achievement in TAPG, most existing works ignore the human perception of interaction between agents and the surrounding environment by applying a deep learning model as a black-box to the untrimmed videos to extract video visual representation. Therefore, it is beneficial and potentially improve the performance of TAPG if we can capture these interactions between agents and the environment. In this paper, we propose a novel framework named Agent-Aware Boundary Network (ABN), which consists of two sub-networks (i) an Agent-Aware Representation Network to obtain both agent-agent and agents-environment relationships in the video representation, and (ii) a Boundary Generation Network to estimate the confidence score of temporal intervals. In the Agent-Aware Representation Network, the interactions between agents are expressed through local pathway, which operates at a local level to focus on the motions of agents whereas the overall perception of the surroundings are expressed through global pathway, which operates at a global level to perceive the effects of agents-environment. Comprehensive evaluations on 20-action THUMOS-14 and 200-action ActivityNet-1.3 datasets with different backbone networks (i.e C3D, SlowFast and Two-Stream) show that our proposed ABN robustly outperforms state-of-the-art methods regardless of the employed backbone network on TAPG. We further examine the proposal quality by leveraging proposals generated by our method onto temporal action detection (TAD) frameworks and evaluate their detection performances. The source code can be found in this URL https://github.com/vhvkhoa/TAPG-AgentEnvNetwork.git.

下载PDF全文

下载文献需遵守相关版权规定

论文标题