线性logistic模型的提高了置信度界限，并应用于线性匪徒

论文标题

线性logistic模型的提高了置信度界限，并应用于线性匪徒

Improved Confidence Bounds for the Linear Logistic Model and Applications to Linear Bandits

论文作者

Jun, Kwang-Sung, Jain, Lalit, Mason, Blake, Nassif, Houssam

论文摘要

我们为线性逻辑模型提出了改进的固定设计置信度范围。我们的界限显着改善了由Li等人约束的最先进的界限。（2017）通过对物流损失的自我符合分析的最新发展（Faury等，2020）。具体来说，我们的信心避免了对$ 1/κ$的直接依赖，其中$κ$是所有武器奖励分布的最小差异。通常，$ 1/κ$以未知线性参数$θ^*$的标准成倍率。我们的信心不依赖这个最糟糕的数量，而是基于任何给定臂的奖励的信心直接取决于该手臂奖励分布的差异。我们向纯粹的探索和遗憾的是最小化的logistic bastits提出了两种应用，以改善最先进的性能保证。对于纯粹的探索，我们还提供了一个下边界，突出显示了一个实例家庭对$ 1/κ$的依赖。

We propose improved fixed-design confidence bounds for the linear logistic model. Our bounds significantly improve upon the state-of-the-art bound by Li et al. (2017) via recent developments of the self-concordant analysis of the logistic loss (Faury et al., 2020). Specifically, our confidence bound avoids a direct dependence on $1/κ$, where $κ$ is the minimal variance over all arms' reward distributions. In general, $1/κ$ scales exponentially with the norm of the unknown linear parameter $θ^*$. Instead of relying on this worst-case quantity, our confidence bound for the reward of any given arm depends directly on the variance of that arm's reward distribution. We present two applications of our novel bounds to pure exploration and regret minimization logistic bandits improving upon state-of-the-art performance guarantees. For pure exploration, we also provide a lower bound highlighting a dependence on $1/κ$ for a family of instances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题