论文标题
分位数多臂匪徒:最佳最佳武器识别和差异私有方案
Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme
论文作者
论文摘要
当目标是在固定的,规定的水平上识别具有最高分位数的手臂时,我们研究具有随机,潜在私人奖励的多臂匪徒的最佳武器识别问题。首先,我们提出了一种(非私人)连续的消除算法,以实现严格最佳最佳臂识别,我们表明我们的算法为$Δ$ -PAC,我们表征了其样品复杂性。此外,我们在预期的拉值数量上提供了一个下限,这表明所提出的算法实质上是对数因素的最佳选择。上和下部复杂性界限都取决于相关的次级隔离间隙的特殊定义,该差距特别是针对分位数匪徒问题设计的,因为我们显示的差距何时接近零,最佳臂识别是不可能的。其次,是由奖励是私人的应用程序的动机,我们提供了一种差异私人连续的消除算法,其样本复杂性即使对于具有无限支持大小的分布,我们的样本复杂性也是有限的,并且我们表征了其样品复杂性。我们的算法不需要对次要差距或其他与匪徒问题有关的统计信息的事先了解。
We study the best-arm identification problem in multi-armed bandits with stochastic, potentially private rewards, when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a (non-private) successive elimination algorithm for strictly optimal best-arm identification, we show that our algorithm is $δ$-PAC and we characterize its sample complexity. Further, we provide a lower bound on the expected number of pulls, showing that the proposed algorithm is essentially optimal up to logarithmic factors. Both upper and lower complexity bounds depend on a special definition of the associated suboptimality gap, designed in particular for the quantile bandit problem, as we show when the gap approaches zero, best-arm identification is impossible. Second, motivated by applications where the rewards are private, we provide a differentially private successive elimination algorithm whose sample complexity is finite even for distributions with infinite support-size, and we characterize its sample complexity. Our algorithms do not require prior knowledge of either the suboptimality gap or other statistical information related to the bandit problem at hand.