一项有关序列驱动的时间抽样和自我动作补偿的前瞻性研究

论文标题

一项有关序列驱动的时间抽样和自我动作补偿的前瞻性研究

A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion Compensation for Action Recognition in the EPIC-Kitchens Dataset

论文作者

López-Cifuentes, Alejandro, Escudero-Viñolo, Marcos, Bescós, Jesús

论文摘要

动作识别目前是计算机视觉中最挑战的研究领域之一。卷积神经网络（CNN）显着提高了其性能，但依赖于固定大小的时空分析窗口，从而减少了CNNS颞型接受场。在动作识别数据集中，以自我为中心的记录序列已成为重要相关性，同时又带来了额外的挑战：不可避免地将自我动作转移到这些序列上。提出的方法旨在通过估计这种自我运动或相机运动来应对它。该估计用于将视频序列划分为运动补偿的时间\ textit {块}，显示在稳定背景下的动作并允许进行内容驱动的时间采样。以端到端方式训练的CNN用于从每个\ textit {块}中提取时间特征，这些特征是滞后的。该过程导致从动作的整个时间范围中提取特征，从而增加了网络的时间接受场。

Action recognition is currently one of the top-challenging research fields in computer vision. Convolutional Neural Networks (CNNs) have significantly boosted its performance but rely on fixed-size spatio-temporal windows of analysis, reducing CNNs temporal receptive fields. Among action recognition datasets, egocentric recorded sequences have become of important relevance while entailing an additional challenge: ego-motion is unavoidably transferred to these sequences. The proposed method aims to cope with it by estimating this ego-motion or camera motion. The estimation is used to temporally partition video sequences into motion-compensated temporal \textit{chunks} showing the action under stable backgrounds and allowing for a content-driven temporal sampling. A CNN trained in an end-to-end fashion is used to extract temporal features from each \textit{chunk}, which are late fused. This process leads to the extraction of features from the whole temporal range of an action, increasing the temporal receptive field of the network.

下载PDF全文

下载文献需遵守相关版权规定

论文标题