Novgrid：一个灵活的网格世界，用于评估代理人对新颖性的反应

论文标题

Novgrid：一个灵活的网格世界，用于评估代理人对新颖性的反应

NovGrid: A Flexible Grid World for Evaluating Agent Response to Novelty

论文作者

Balloch, Jonathan, Lin, Zhiyu, Hussain, Mustafa, Srinivas, Aarun, Wright, Robert, Peng, Xiangyu, Kim, Julia, Riedl, Mark

论文摘要

已经开发了强大的增强学习技术，以解决复杂的顺序决策问题。但是，这些方法假定火车和评估任务来自类似或分布的环境。在现实生活中，这种假设不足以使环境的小新颖变化可以使以前学习的政策失败或引入更简单的解决方案，而更简单的解决方案可能永远不会被发现。为此，我们探讨了{\ em Noverty}的概念，该概念在这项工作中定义为突然变化对环境的机制或特性的变化。我们提供了与连续决策最相关的新颖性的本体论，这区分了影响对象与动作的新颖性与动作，一单位属性与非贵重关系以及将解决方案分配到任务的新颖性。我们介绍了Novgrid，这是一个建立在Minigrid上的新颖性生成框架，它是一种用于快速开发和评估新颖的适应强化增强学习技术的工具包。与核心Novgrid一起，我们提供了与我们的本体相一致的典范新颖性，并将它们实例化为可应用于许多Miligrid commiant环境的新颖模板。最后，我们在框架中介绍了一组指标，以评估新颖的适应机器学习技术，并使用这些指标显示基线RL模型的特征。

A robust body of reinforcement learning techniques have been developed to solve complex sequential decision making problems. However, these methods assume that train and evaluation tasks come from similarly or identically distributed environments. This assumption does not hold in real life where small novel changes to the environment can make a previously learned policy fail or introduce simpler solutions that might never be found. To that end we explore the concept of {\em novelty}, defined in this work as the sudden change to the mechanics or properties of environment. We provide an ontology of for novelties most relevant to sequential decision making, which distinguishes between novelties that affect objects versus actions, unary properties versus non-unary relations, and the distribution of solutions to a task. We introduce NovGrid, a novelty generation framework built on MiniGrid, acting as a toolkit for rapidly developing and evaluating novelty-adaptation-enabled reinforcement learning techniques. Along with the core NovGrid we provide exemplar novelties aligned with our ontology and instantiate them as novelty templates that can be applied to many MiniGrid-compliant environments. Finally, we present a set of metrics built into our framework for the evaluation of novelty-adaptation-enabled machine-learning techniques, and show characteristics of a baseline RL model using these metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题