贪婪的算法几乎在平滑的上下文匪徒中占主导地位

论文标题

贪婪的算法几乎在平滑的上下文匪徒中占主导地位

Greedy Algorithm almost Dominates in Smoothed Contextual Bandits

论文作者

Raghavan, Manish, Slivkins, Aleksandrs, Vaughan, Jennifer Wortman, Wu, Zhiwei Steven

论文摘要

在线学习算法（广泛用于在网络上为搜索和内容优化）的广泛使用，必须平衡探索和开发，并有可能牺牲当前用户的经验，以获取将来会在将来做出更好决策的信息。尽管在最坏的情况下必要，但与贪婪的算法相比，显式探索始终通过选择当前看起来最佳的动作来“利用”的贪婪算法有很多缺点。我们询问数据中固有的多样性在哪些条件下，使明确的探索不必要。我们基于在线性上下文匪徒模型中对贪婪算法的平滑分析的最新工作。我们改进了先前的结果，以表明一种贪婪的方法几乎与同一问题实例上任何其他算法的最佳贝叶斯后悔率相匹配，每当多样性条件下，这种遗憾最多是$ \ tilde o（t^{1/3}）$。

Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users in order to gain information that will lead to better decisions in the future. While necessary in the worst case, explicit exploration has a number of disadvantages compared to the greedy algorithm that always "exploits" by choosing an action that currently looks optimal. We ask under what conditions inherent diversity in the data makes explicit exploration unnecessary. We build on a recent line of work on the smoothed analysis of the greedy algorithm in the linear contextual bandits model. We improve on prior results to show that a greedy approach almost matches the best possible Bayesian regret rate of any other algorithm on the same problem instance whenever the diversity conditions hold, and that this regret is at most $\tilde O(T^{1/3})$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题