第一人称视频的滚动删除LSTMS进行行动期望

论文标题

第一人称视频的滚动删除LSTMS进行行动期望

Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video

论文作者

Furnari, Antonino, Farinella, Giovanni Maria

论文摘要

在本文中，我们解决了以自我为中心动作预期的问题，即预测相机佩戴者在不久的将来将执行的操作以及与哪些对象相互作用。具体来说，我们贡献了滚动Unrolling LSTM，这是一种学习架构，以预测以自我为中心的视频。该方法基于三个组成部分：1）由两个LSTM组成的体系结构，以模拟总结过去和推断未来的子任务，2）序列完成预训练的预训练技术，鼓励LSTMS专注于不同的子任务，以及3）选择基于高度融合的机制（MATT），以高效的融合对象进行多种型号的机制，并通过处理多种型号的机制，并进行多种型号的机制。特征。提出的方法在Epic-Kitchens，Egtea凝视+和ActivityNet上得到了验证。实验表明，所提出的体系结构是以自我为中心视频领域的最新架构，在2019年Epic-kitchens Egintric Egentric Action Truepation挑战中取得了最佳表现。该方法还针对非基于无监督的预训练的方法实现了活动网络的竞争性能，并概括了早期行动识别和行动识别的任务。为了鼓励对这个充满挑战的主题进行研究，我们在我们的网页上制作了代码，训练有素的模型和预提取功能：http：//iplab.dmi.unict.it/rulstm。

In this paper, we tackle the problem of egocentric action anticipation, i.e., predicting what actions the camera wearer will perform in the near future and which objects they will interact with. Specifically, we contribute Rolling-Unrolling LSTM, a learning architecture to anticipate actions from egocentric videos. The method is based on three components: 1) an architecture comprised of two LSTMs to model the sub-tasks of summarizing the past and inferring the future, 2) a Sequence Completion Pre-Training technique which encourages the LSTMs to focus on the different sub-tasks, and 3) a Modality ATTention (MATT) mechanism to efficiently fuse multi-modal predictions performed by processing RGB frames, optical flow fields and object-based features. The proposed approach is validated on EPIC-Kitchens, EGTEA Gaze+ and ActivityNet. The experiments show that the proposed architecture is state-of-the-art in the domain of egocentric videos, achieving top performances in the 2019 EPIC-Kitchens egocentric action anticipation challenge. The approach also achieves competitive performance on ActivityNet with respect to methods not based on unsupervised pre-training and generalizes to the tasks of early action recognition and action recognition. To encourage research on this challenging topic, we made our code, trained models, and pre-extracted features available at our web page: http://iplab.dmi.unict.it/rulstm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题