论文标题
自我topo:以自我为中心的视频获得环境
EGO-TOPO: Environment Affordances from Egocentric Video
论文作者
论文摘要
第一人称视频自然会将物理环境的使用到最前沿,因为它显示了摄像头佩戴者根据他的意图在空间中流畅的交互。但是,当前方法在很大程度上将观察到的动作与持续空间本身区分开。我们介绍了一种直接从以自我为中心的视频中学到的环境负担的模型。主要思想是获得一个以人为本的物理空间(例如厨房)的模型,该模型捕获(1)互动的主要空间区域以及(2)他们支持的活动。我们的方法将空间分解为源自第一人称活动的拓扑图,将自我视频组织成一系列对不同区域的访问。此外,我们展示了如何在多个相关环境(例如,从多个厨房的视频)链接区域以获得环境功能的合并表示。在Epic-Kitchens和Egtea+上,我们演示了学习场景提供的方法,并在长期视频中预测未来的动作。
First-person video naturally brings the use of a physical environment to the forefront, since it shows the camera wearer interacting fluidly in a space based on his intentions. However, current methods largely separate the observed actions from the persistent space itself. We introduce a model for environment affordances that is learned directly from egocentric video. The main idea is to gain a human-centric model of a physical space (such as a kitchen) that captures (1) the primary spatial zones of interaction and (2) the likely activities they support. Our approach decomposes a space into a topological map derived from first-person activity, organizing an ego-video into a series of visits to the different zones. Further, we show how to link zones across multiple related environments (e.g., from videos of multiple kitchens) to obtain a consolidated representation of environment functionality. On EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene affordances and anticipating future actions in long-form video.