有界召回的在线学习

论文标题

有界召回的在线学习

Online Learning with Bounded Recall

论文作者

Schneider, Jon, Vodrahalli, Kiran

论文摘要

我们研究了在重复游戏的研究中流行的“有限召回”设置中充分信息在线学习的问题。在线学习算法$ \ mathcal {a} $是$ m $ - $ \ textit {bounded-recall} $，如果可以将其在时间$ t $的输出$ t $写入$ M $上一奖励（而不是任何其他内部$ \ MATHCAL {A} $）。我们首先证明，一种自然的方法来构建来自基于平均值的无regret学习算法（例如，在最后$ m $ rounds上运行树篱）中的有界重记录算法失败，并且任何此类算法每回合都会持续遗憾。然后，我们构建了一种固定的有界核心算法，该算法可实现$θ（1/\ sqrt {m}）$的每轮遗憾，我们与紧密的下限进行补充。最后，我们表明，与完美的召回设置不同，任何低的遗憾有限的界限算法都必须意识到过去的$ m $损失的顺序 - 任何有界的回复算法，它在过去的$ m $损失中扮演对称功能，都必须每回合都持续不断的遗憾。

We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm $\mathcal{A}$ is $M$-$\textit{bounded-recall}$ if its output at time $t$ can be written as a function of the $M$ previous rewards (and not e.g. any other internal state of $\mathcal{A}$). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last $M$ rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of $Θ(1/\sqrt{M})$, which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past $M$ losses -- any bounded-recall algorithm which plays a symmetric function of the past $M$ losses must incur constant regret per round.

下载PDF全文

下载文献需遵守相关版权规定

论文标题