IMO $^3 $：交互式多目标非政策优化

论文标题

IMO $^3 $：交互式多目标非政策优化

IMO$^3$: Interactive Multi-Objective Off-Policy Optimization

论文作者

Wang, Nan, Wang, Hongning, Karimzadehgan, Maryam, Kveton, Branislav, Boutilier, Craig

论文摘要

大多数实际优化问题都有多种目标。系统设计师需要找到一项策略，以交易这些目标才能达到所需的操作点。在已知目标函数的设置中，已经对这个问题进行了广泛的研究。我们认为目标功能的更实际但具有挑战性的环境。在行业中，此问题主要通过在线A/B测试来解决，这通常是昂贵且效率低下的。作为替代方案，我们提出了交互式多目标非政策优化（IMO $^3 $）。我们方法中的关键思想是，使用以非政策方式评估的策略与系统设计师进行交互，以发现最大化其未知实用程序功能的政策。从理论上讲，IMO $^3 $以很高的可能性标识了近乎最佳的政策，具体取决于设计师的反馈和培训数据的反馈数量，以进行非政策估算。我们在多个多目标优化问题上经验证明了其有效性。

Most real-world optimization problems have multiple objectives. A system designer needs to find a policy that trades off these objectives to reach a desired operating point. This problem has been studied extensively in the setting of known objective functions. We consider a more practical but challenging setting of unknown objective functions. In industry, this problem is mostly approached with online A/B testing, which is often costly and inefficient. As an alternative, we propose interactive multi-objective off-policy optimization (IMO$^3$). The key idea in our approach is to interact with a system designer using policies evaluated in an off-policy fashion to uncover which policy maximizes her unknown utility function. We theoretically show that IMO$^3$ identifies a near-optimal policy with high probability, depending on the amount of feedback from the designer and training data for off-policy estimation. We demonstrate its effectiveness empirically on multiple multi-objective optimization problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题