贝叶斯对模仿学习的强大优化

论文标题

贝叶斯对模仿学习的强大优化

Bayesian Robust Optimization for Imitation Learning

论文作者

Brown, Daniel S., Niekum, Scott, Petrik, Marek

论文摘要

模仿学习的主要挑战之一是确定代理商在示威游行的状态分布之外时应采取的行动。逆增强学习（IRL）可以通过学习参数化奖励功能来使对新状态的概括，但是这些方法仍然面临着对真实奖励功能和相应最佳策略的不确定性。现有的安全模仿学习方法基于IRL处理这种不确定性，使用Maxmin框架在对抗性奖励功能下优化政策，而风险中立的IRL方法可以优化平均值或地图奖励功能的策略。虽然完全忽略风险会导致过度侵略性和不安全的政策，但在完全对抗性方面进行优化也是有问题的，因为它可能导致过于保守的政策在实践中的表现较差。为了在这两个极端之间提供桥梁，我们建议贝叶斯对模仿学习的强大优化（Broil）。 Broil利用贝叶斯奖励功能推论和用户特定的风险承受能力，以有效地优化可靠的策略，该政策平衡了预期的回报和有条件价值的风险。我们的经验结果表明，Broil提供了一种自然的方法，可以在返回最大化和风险最小行为之间插值，并且表现优于现有风险敏感和风险中性的逆强化学习算法。代码可从https://github.com/dsbrown1331/broil获得。

One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function. While completely ignoring risk can lead to overly aggressive and unsafe policies, optimizing in a fully adversarial sense is also problematic as it can lead to overly conservative policies that perform poorly in practice. To provide a bridge between these two extremes, we propose Bayesian Robust Optimization for Imitation Learning (BROIL). BROIL leverages Bayesian reward function inference and a user specific risk tolerance to efficiently optimize a robust policy that balances expected return and conditional value at risk. Our empirical results show that BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors and outperforms existing risk-sensitive and risk-neutral inverse reinforcement learning algorithms. Code is available at https://github.com/dsbrown1331/broil.

下载PDF全文

下载文献需遵守相关版权规定

论文标题