论文标题
在视频中进行弱监督时刻检索的正规化两分支提案网络
Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos
论文作者
论文摘要
视频时刻检索旨在根据给定的句子将目标时刻定位在视频中。弱监督的设置仅提供培训期间的视频级句子注释。大多数现有的弱监督方法采用基于MIL的框架来发展样本间对抗,但忽略了语义上类似内容的矩之间的样本内对抗。因此,这些方法无法将目标力矩与合理的负矩区分开。在本文中,我们提出了一个新颖的正规两分支提案网络,以同时考虑样本间和样本中的对抗。具体而言,我们首先设计了一种语言感知过滤器,以生成增强的视频流和被抑制的视频流。然后,我们设计了可共享的两分支提案模块,以从增强的流和可见的负面提案中产生积极的建议,从被抑制的提案中产生足够的对抗。此外,我们将建议正规化应用于稳定训练过程并提高模型性能。广泛的实验显示了我们方法的有效性。我们的代码在此处发布。
Video moment retrieval aims to localize the target moment in an video according to the given sentence. The weak-supervised setting only provides the video-level sentence annotations during training. Most existing weak-supervised methods apply a MIL-based framework to develop inter-sample confrontment, but ignore the intra-sample confrontment between moments with semantically similar contents. Thus, these methods fail to distinguish the target moment from plausible negative moments. In this paper, we propose a novel Regularized Two-Branch Proposal Network to simultaneously consider the inter-sample and intra-sample confrontments. Concretely, we first devise a language-aware filter to generate an enhanced video stream and a suppressed video stream. We then design the sharable two-branch proposal module to generate positive proposals from the enhanced stream and plausible negative proposals from the suppressed one for sufficient confrontment. Further, we apply the proposal regularization to stabilize the training process and improve model performance. The extensive experiments show the effectiveness of our method. Our code is released at here.