弱监督时间活动定位的对抗背景感知损失

论文标题

弱监督时间活动定位的对抗背景感知损失

Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization

论文作者

Min, Kyle, Corso, Jason J.

论文摘要

近年来，在未修剪视频中进行的时间本地化活动已进行了广泛的研究。尽管有最近的进步，但现有的弱监督时间活动定位方法努力识别何时没有发生活动。为了解决这个问题，我们提出了一种名为A2CL-PT的新颖方法。在我们的方法中考虑了特征空间的两个三重序：一个三重态用于学习每个活动类别的判别特征，而另一个则用于区分任何活动（即背景特征）与每个视频的活动相关特征的特征。为了进一步提高性能，我们使用两个平行分支以对抗性方式运行的两个平行分支建立了我们的网络：第一个分支将视频中最显着的活动定位，第二个分支从视频的非定位部分找到了其他补充活动。在Thumos14和ActivityNet数据集上进行的广泛实验表明我们提出的方法是有效的。具体而言，THUMOS14数据集的IOU阈值的平均图从27.9％显着提高到30.0％。

Temporally localizing activities within untrimmed videos has been extensively studied in recent years. Despite recent advances, existing methods for weakly-supervised temporal activity localization struggle to recognize when an activity is not occurring. To address this issue, we propose a novel method named A2CL-PT. Two triplets of the feature space are considered in our approach: one triplet is used to learn discriminative features for each activity class, and the other one is used to distinguish the features where no activity occurs (i.e. background features) from activity-related features for each video. To further improve the performance, we build our network using two parallel branches which operate in an adversarial way: the first branch localizes the most salient activities of a video and the second one finds other supplementary activities from non-localized parts of the video. Extensive experiments performed on THUMOS14 and ActivityNet datasets demonstrate that our proposed method is effective. Specifically, the average mAP of IoU thresholds from 0.1 to 0.9 on the THUMOS14 dataset is significantly improved from 27.9% to 30.0%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题