论文标题

贝叶斯停止时间问题的必要条件

Necessary and Sufficient Conditions for Inverse Reinforcement Learning of Bayesian Stopping Time Problems

论文作者

Pattanayak, Kunal, Krishnamurthy, Vikram

论文摘要

本文提出了贝叶斯停止时间问题的逆增强学习〜(IRL)框架。通过观察贝叶斯决策者的行为,我们提供了必要的条件,以确定这些动作是否与优化成本功能保持一致。在贝叶斯(部分观察到的)环境中,逆学习者充其量可以确定最优性WRT观察到的策略。我们的IRL算法确定了最佳性,然后构建了成本函数的设定值估计值。为了实现这一IRL目标,我们使用贝叶斯的新思想揭示了来自微观经济学的偏好。我们使用停止时间问题的两个重要示例说明了提出的IRL方案,即顺序假设测试和贝叶斯搜索。作为现实世界的示例,我们使用一个包括190000视频的元数据的YouTube数据集说明了建议的IRL方法如何以高精度预测在线多媒体平台中的用户参与度。最后,对于有限数据集,我们提出了IRL检测算法,并就其误差概率给出有限的样本范围。

This paper presents an inverse reinforcement learning~(IRL) framework for Bayesian stopping time problems. By observing the actions of a Bayesian decision maker, we provide a necessary and sufficient condition to identify if these actions are consistent with optimizing a cost function. In a Bayesian (partially observed) setting, the inverse learner can at best identify optimality wrt the observed strategies. Our IRL algorithm identifies optimality and then constructs set-valued estimates of the cost function.To achieve this IRL objective, we use novel ideas from Bayesian revealed preferences stemming from microeconomics. We illustrate the proposed IRL scheme using two important examples of stopping time problems, namely, sequential hypothesis testing and Bayesian search. As a real-world example, we illustrate using a YouTube dataset comprising metadata from 190000 videos how the proposed IRL method predicts user engagement in online multimedia platforms with high accuracy. Finally, for finite datasets, we propose an IRL detection algorithm and give finite sample bounds on its error probabilities.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源