论文标题
匪徒问题的相对精确度的PAC算法和昂贵的采样
A PAC algorithm in relative precision for bandit problem with costly sampling
论文作者
论文摘要
本文考虑了在有限集或有限臂匪徒问题上最大化期望功能的问题。我们首先提出了一种幼稚的随机匪徒算法,以相对精确地获得该离散优化问题的近似正确(PAC)解决方案,这是一种解决优化问题的解决方案,最高至小于规定的公差的相对误差,具有很高的可能性。我们还提出了一种自适应随机匪徒算法,该算法提供具有相同保证的PAC溶解液。自适应算法在生成的样品数量方面优于天真算法的平均复杂性,并且特别适合于采样成本高的应用。
This paper considers the problem of maximizing an expectation function over a finite set, or finite-arm bandit problem. We first propose a naive stochastic bandit algorithm for obtaining a probably approximately correct (PAC) solution to this discrete optimization problem in relative precision, that is a solution which solves the optimization problem up to a relative error smaller than a prescribed tolerance, with high probability. We also propose an adaptive stochastic bandit algorithm which provides a PAC-solution with the same guarantees. The adaptive algorithm outperforms the mean complexity of the naive algorithm in terms of number of generated samples and is particularly well suited for applications with high sampling cost.