在黑箱环境中进行增强学习的动态屏蔽

论文标题

在黑箱环境中进行增强学习的动态屏蔽

Dynamic Shielding for Reinforcement Learning in Black-Box Environments

论文作者

Waga, Masaki, Castellano, Ezequiel, Pruekprasert, Sasinee, Klikovits, Stefan, Takisaka, Toru, Hasuo, Ichiro

论文摘要

由于学习过程中缺乏安全保证，在网络物理系统中使用加固学习（RL）是具有挑战性的。尽管有各种建议在学习过程中减少不希望的行为，但这些技术中的大多数都需要先前的系统知识，并且其适用性是有限的。本文旨在减少学习过程中不希望的行为，而无需任何先前的系统知识。我们建议动态屏蔽：使用自动机学习的基于模型的安全RL技术的扩展。动态屏蔽技术使用RPNI算法的变体和RL平行构建近似系统模型，并由于学习模型构建的盾牌而抑制了不希望的探索。通过这种组合，在代理商体验到他们之前，可以预见潜在的不安全行动。实验表明，我们的动态屏蔽大大减少了训练期间不希望的事件的数量。

It is challenging to use reinforcement learning (RL) in cyber-physical systems due to the lack of safety guarantees during learning. Although there have been various proposals to reduce undesired behaviors during learning, most of these techniques require prior system knowledge, and their applicability is limited. This paper aims to reduce undesired behaviors during learning without requiring any prior system knowledge. We propose dynamic shielding: an extension of a model-based safe RL technique called shielding using automata learning. The dynamic shielding technique constructs an approximate system model in parallel with RL using a variant of the RPNI algorithm and suppresses undesired explorations due to the shield constructed from the learned model. Through this combination, potentially unsafe actions can be foreseen before the agent experiences them. Experiments show that our dynamic shield significantly decreases the number of undesired events during training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题