论文标题
蒙版阿塔里(Atari
Mask Atari for Deep Reinforcement Learning as POMDP Benchmarks
论文作者
论文摘要
我们提出了Mask Atari,这是一种新的基准,可帮助解决基于深度强化学习(DRL)方法的部分可观察到的马尔可夫决策过程(POMDP)问题。为了实现POMDP问题的模拟环境,Mask Atari是基于Atari 2600游戏构建的,具有可控,可移动和可学习的面具,作为目标代理的观察区域,尤其是在POMDPS中的主动信息收集(AIG)设置。鉴于一个人尚不存在,Mask Atari为评估关注上述问题的方法提供了一个具有挑战性,有效的基准。此外,掩模操作是一项试验,将人类视觉系统中的接受场引入剂的模拟环境中,这意味着与人类基线相比,评估并不偏向于感应能力,纯粹关注方法的认知性能。我们描述了基准的挑战和特征,并用面具Atari评估了几个基线。
We present Mask Atari, a new benchmark to help solve partially observable Markov decision process (POMDP) problems with Deep Reinforcement Learning (DRL)-based approaches. To achieve a simulation environment for the POMDP problems, Mask Atari is constructed based on Atari 2600 games with controllable, moveable, and learnable masks as the observation area for the target agent, especially with the active information gathering (AIG) setting in POMDPs. Given that one does not yet exist, Mask Atari provides a challenging, efficient benchmark for evaluating the methods that focus on the above problem. Moreover, the mask operation is a trial for introducing the receptive field in the human vision system into a simulation environment for an agent, which means the evaluations are not biased from the sensing ability and purely focus on the cognitive performance of the methods when compared with the human baseline. We describe the challenges and features of our benchmark and evaluate several baselines with Mask Atari.