论文标题

双重探索 - 然后宣传:渐近最优性及以后

Double Explore-then-Commit: Asymptotic Optimality and Beyond

论文作者

Jin, Tianyuan, Xu, Pan, Xiao, Xiaokui, Gu, Quanquan

论文摘要

我们研究了Subgaussian Rewards的多军匪徒问题。探索阶段是探索阶段,然后是剥削阶段,是各种在线决策应用程序中最广泛使用的算法之一。然而,它已在Garivier等人中显示。 (2016年),随着地平线的增长,在渐近意义上是次优的,因此比完全顺序的策略(例如上置信度结合(UCB))差。在本文中,我们表明,像UCB型算法一样,ETC算法的变体实际上可以实现多臂匪徒问题的渐近最佳性,并将其扩展到批处理的匪徒设置。具体而言,我们提出了一个双重探索 - 然后使用两个探索和剥削阶段的算法(DETC)算法,并证明DETC实现了渐近最佳的最佳遗憾结合。据我们所知,DETC是实现渐近优化性的第一种非序列算法。此外,我们将确定扩展到批处理的匪徒问题,其中(i)勘探过程分为少量批次,(ii)圆形复杂性具有核心意义。我们证明,批处理的DETC可以以恒定的圆形复杂性实现渐近最优性。这是第一个可以同时达到最佳渐近遗憾和最佳圆形复杂性的批处理匪徒。

We study the multi-armed bandit problem with subgaussian rewards. The explore-then-commit (ETC) strategy, which consists of an exploration phase followed by an exploitation phase, is one of the most widely used algorithms in a variety of online decision applications. Nevertheless, it has been shown in Garivier et al. (2016) that ETC is suboptimal in the asymptotic sense as the horizon grows, and thus, is worse than fully sequential strategies such as Upper Confidence Bound (UCB). In this paper, we show that a variant of ETC algorithm can actually achieve the asymptotic optimality for multi-armed bandit problems as UCB-type algorithms do and extend it to the batched bandit setting. Specifically, we propose a double explore-then-commit (DETC) algorithm that has two exploration and exploitation phases and prove that DETC achieves the asymptotically optimal regret bound. To our knowledge, DETC is the first non-fully-sequential algorithm that achieves such asymptotic optimality. In addition, we extend DETC to batched bandit problems, where (i) the exploration process is split into a small number of batches and (ii) the round complexity is of central interest. We prove that a batched version of DETC can achieve the asymptotic optimality with only a constant round complexity. This is the first batched bandit algorithm that can attain the optimal asymptotic regret bound and optimal round complexity simultaneously.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源