匪徒的土匪超出了最差的案例

论文标题

匪徒的土匪超出了最差的案例

Bandits with Knapsacks beyond the Worst-Case

论文作者

Sankararaman, Karthik Abinav, Slivkins, Aleksandrs

论文摘要

带有背包（BWK）的土匪是供应/预算限制的多臂土匪的通用模型。尽管对BWK的最糟糕的遗憾界限是充分理解的，但我们提出的三个结果超出了最糟糕的观点。首先，我们提供上限和下限，等同于对数，依赖实例的后悔率的全面表征。其次，我们考虑BWK中的“简单遗憾”，该BWK跟踪算法在给定的一轮中的表现，并证明它除几轮外都很小。第三，我们提供了从BWK到土匪的一般“还原”，它利用了一些已知的有用的结构，并将此还原应用于组合半伴侣，线性上下文的匪徒和多项式木块。我们的结果基于\ citet {agrawaldevanur-ec14}的BWK算法，提供了新的分析。

Bandits with Knapsacks (BwK) is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds for BwK are well-understood, we present three results that go beyond the worst-case perspective. First, we provide upper and lower bounds which amount to a full characterization for logarithmic, instance-dependent regret rates. Second, we consider "simple regret" in BwK, which tracks algorithm's performance in a given round, and prove that it is small in all but a few rounds. Third, we provide a general "reduction" from BwK to bandits which takes advantage of some known helpful structure, and apply this reduction to combinatorial semi-bandits, linear contextual bandits, and multinomial-logit bandits. Our results build on the BwK algorithm from \citet{AgrawalDevanur-ec14}, providing new analyses thereof.

下载PDF全文

下载文献需遵守相关版权规定

论文标题