论文标题
IMO $^3 $:交互式多目标非政策优化
IMO$^3$: Interactive Multi-Objective Off-Policy Optimization
论文作者
论文摘要
大多数实际优化问题都有多种目标。系统设计师需要找到一项策略,以交易这些目标才能达到所需的操作点。在已知目标函数的设置中,已经对这个问题进行了广泛的研究。我们认为目标功能的更实际但具有挑战性的环境。在行业中,此问题主要通过在线A/B测试来解决,这通常是昂贵且效率低下的。作为替代方案,我们提出了交互式多目标非政策优化(IMO $^3 $)。我们方法中的关键思想是,使用以非政策方式评估的策略与系统设计师进行交互,以发现最大化其未知实用程序功能的政策。从理论上讲,IMO $^3 $以很高的可能性标识了近乎最佳的政策,具体取决于设计师的反馈和培训数据的反馈数量,以进行非政策估算。我们在多个多目标优化问题上经验证明了其有效性。
Most real-world optimization problems have multiple objectives. A system designer needs to find a policy that trades off these objectives to reach a desired operating point. This problem has been studied extensively in the setting of known objective functions. We consider a more practical but challenging setting of unknown objective functions. In industry, this problem is mostly approached with online A/B testing, which is often costly and inefficient. As an alternative, we propose interactive multi-objective off-policy optimization (IMO$^3$). The key idea in our approach is to interact with a system designer using policies evaluated in an off-policy fashion to uncover which policy maximizes her unknown utility function. We theoretically show that IMO$^3$ identifies a near-optimal policy with high probability, depending on the amount of feedback from the designer and training data for off-policy estimation. We demonstrate its effectiveness empirically on multiple multi-objective optimization problems.