论文标题
线性匪徒的最佳最佳武器识别
Optimal Best-arm Identification in Linear Bandits
论文作者
论文摘要
我们研究了最佳臂识别的问题,并在随机线性斑点上固定信心。目的是确定具有给定确定性水平的最佳臂,同时最大程度地减少采样预算。我们设计了一种简单的算法,其采样复杂性与已知实例特异性的下限匹配,几乎可以肯定地肯定并且在预期中。该算法依赖于跟踪ARM绘制最佳比例的ARM采样规则,并且可以像我们希望的那样很少进行更新,而不会损害其理论保证。此外,与现有的最佳武器识别策略不同,我们的算法使用的停止规则不取决于武器的数量。实验结果表明,我们的算法明显胜过现有的算法。该论文进一步提供了与连续臂的线性匪徒中最佳武器识别问题的首次分析。
We study the problem of best-arm identification with fixed confidence in stochastic linear bandits. The objective is to identify the best arm with a given level of certainty while minimizing the sampling budget. We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds, asymptotically almost surely and in expectation. The algorithm relies on an arm sampling rule that tracks an optimal proportion of arm draws, and that remarkably can be updated as rarely as we wish, without compromising its theoretical guarantees. Moreover, unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms. Experimental results suggest that our algorithm significantly outperforms existing algorithms. The paper further provides a first analysis of the best-arm identification problem in linear bandits with a continuous set of arms.