论文标题
一项有关序列驱动的时间抽样和自我动作补偿的前瞻性研究
A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion Compensation for Action Recognition in the EPIC-Kitchens Dataset
论文作者
论文摘要
动作识别目前是计算机视觉中最挑战的研究领域之一。卷积神经网络(CNN)显着提高了其性能,但依赖于固定大小的时空分析窗口,从而减少了CNNS颞型接受场。在动作识别数据集中,以自我为中心的记录序列已成为重要相关性,同时又带来了额外的挑战:不可避免地将自我动作转移到这些序列上。提出的方法旨在通过估计这种自我运动或相机运动来应对它。该估计用于将视频序列划分为运动补偿的时间\ textit {块},显示在稳定背景下的动作并允许进行内容驱动的时间采样。以端到端方式训练的CNN用于从每个\ textit {块}中提取时间特征,这些特征是滞后的。该过程导致从动作的整个时间范围中提取特征,从而增加了网络的时间接受场。
Action recognition is currently one of the top-challenging research fields in computer vision. Convolutional Neural Networks (CNNs) have significantly boosted its performance but rely on fixed-size spatio-temporal windows of analysis, reducing CNNs temporal receptive fields. Among action recognition datasets, egocentric recorded sequences have become of important relevance while entailing an additional challenge: ego-motion is unavoidably transferred to these sequences. The proposed method aims to cope with it by estimating this ego-motion or camera motion. The estimation is used to temporally partition video sequences into motion-compensated temporal \textit{chunks} showing the action under stable backgrounds and allowing for a content-driven temporal sampling. A CNN trained in an end-to-end fashion is used to extract temporal features from each \textit{chunk}, which are late fused. This process leads to the extraction of features from the whole temporal range of an action, increasing the temporal receptive field of the network.