论文标题
汤普森采样的分析和设计,用于随机部分监测
Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring
论文作者
论文摘要
我们研究有限的随机部分监测,这是一个有限反馈的顺序学习的通用模型。虽然汤普森采样是各种在线决策问题上最有希望的算法之一,但理论上尚未对其随机部分监测的特性进行研究,现有算法依赖于后验分布的启发式近似。为了减轻这些问题,我们提出了一种新颖的基于汤普森采样的算法,这使我们能够从后验分布中精确地对目标参数进行采样。此外,我们证明,新算法可以实现对数问题的预期pseudo-regret $ \ mathrm {o}(\ log t)$,用于与本地可观察性的线性化变体。该结果是汤普森采样的第一个遗憾,以进行部分监测,这也成为汤普森采样的第一个对数遗憾。
We investigate finite stochastic partial monitoring, which is a general model for sequential learning with limited feedback. While Thompson sampling is one of the most promising algorithms on a variety of online decision-making problems, its properties for stochastic partial monitoring have not been theoretically investigated, and the existing algorithm relies on a heuristic approximation of the posterior distribution. To mitigate these problems, we present a novel Thompson-sampling-based algorithm, which enables us to exactly sample the target parameter from the posterior distribution. Besides, we prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $\mathrm{O}(\log T)$ for a linearized variant of the problem with local observability. This result is the first regret bound of Thompson sampling for partial monitoring, which also becomes the first logarithmic regret bound of Thompson sampling for linear bandits.