选择性的凝视增强以增强Atari游戏中的模仿学习

论文标题

选择性的凝视增强以增强Atari游戏中的模仿学习

Selective Eye-gaze Augmentation To Enhance Imitation Learning In Atari Games

论文作者

Thammineni, Chaitanya, Manjunatha, Hemanth, Esfahani, Ehsan T.

论文摘要

本文介绍了在Atari Games学习人类行动中的选择性使用眼睛信息。大量证据表明，我们的眼睛运动传达了有关我们注意力和精神状态方向的大量信息，并编码完成任务所需的信息。基于这一证据，我们假设选择性使用眼睛作为注意力方向的线索将增强示威的学习。为此，我们提出了一个选择性的凝视增强（SEA）网络，该网络学习何时使用眼目光信息。提出的网络体系结构由三个子网络组成：凝视预测，门控和动作预测网络。使用先前的4个游戏框架，用目光预测网络预测了凝视图，该网络用于增强输入框架。门控网络将确定是否应将预测的凝视图用于学习，并馈送到最终网络以预测当前帧的操作。为了验证这种方法，我们使用公开可用的Atari人类眼球和示范（Atari-Head）数据集由20个Atari游戏组成，其中有2800万人的示威活动和3.28亿次眼光凝视（超越游戏框架）。我们证明了与最先进的注意力指导模仿学习（AGIL），行为克隆（BC）相比，选择性眼神增强的功效。结果表明，选择性增强方法（SEA Network）的性能明显优于Agil和BC。此外，为了证明通过门控网络选择性凝视的意义，我们将方法与视线的随机选择进行了比较。即使在这种情况下，SEA Network也可以更好地验证在示范学习中选择性地使用视线的优势。

This paper presents the selective use of eye-gaze information in learning human actions in Atari games. Vast evidence suggests that our eye movement convey a wealth of information about the direction of our attention and mental states and encode the information necessary to complete a task. Based on this evidence, we hypothesize that selective use of eye-gaze, as a clue for attention direction, will enhance the learning from demonstration. For this purpose, we propose a selective eye-gaze augmentation (SEA) network that learns when to use the eye-gaze information. The proposed network architecture consists of three sub-networks: gaze prediction, gating, and action prediction network. Using the prior 4 game frames, a gaze map is predicted by the gaze prediction network which is used for augmenting the input frame. The gating network will determine whether the predicted gaze map should be used in learning and is fed to the final network to predict the action at the current frame. To validate this approach, we use publicly available Atari Human Eye-Tracking And Demonstration (Atari-HEAD) dataset consists of 20 Atari games with 28 million human demonstrations and 328 million eye-gazes (over game frames) collected from four subjects. We demonstrate the efficacy of selective eye-gaze augmentation in comparison with state of the art Attention Guided Imitation Learning (AGIL), Behavior Cloning (BC). The results indicate that the selective augmentation approach (the SEA network) performs significantly better than the AGIL and BC. Moreover, to demonstrate the significance of selective use of gaze through the gating network, we compare our approach with the random selection of the gaze. Even in this case, the SEA network performs significantly better validating the advantage of selectively using the gaze in demonstration learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题