多目标策略优化的分配视图

论文标题

多目标策略优化的分配视图

A Distributional View on Multi-Objective Policy Optimization

论文作者

Abdolmaleki, Abbas, Huang, Sandy H., Hasenclever, Leonard, Neunert, Michael, Song, H. Francis, Zambelli, Martina, Martins, Murilo F., Heess, Nicolas, Hadsell, Raia, Riedmiller, Martin

论文摘要

许多现实世界中的问题需要交易多个竞争目标。但是，这些目标通常以不同的单位和/或量表为单位，这可能使从业者表达对本机单元中目标的数值偏好的挑战。在本文中，我们提出了一种用于多目标增强学习的新型算法，该算法可以以规模不变的方式为目标设定所需的偏好。我们建议学习每个目标的动作分布，并使用监督的学习将参数策略符合这些分布的结合。我们证明了方法对挑战高维真实和模拟机器人技术任务的有效性，并表明在框架中设置不同的偏好使我们能够追踪非主导解决方案的空间。

Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their native units. In this paper we propose a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way. We propose to learn an action distribution for each objective, and we use supervised learning to fit a parametric policy to a combination of these distributions. We demonstrate the effectiveness of our approach on challenging high-dimensional real and simulated robotics tasks, and show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题