论文标题
上下文强盗的超参数调整
Hyper-parameter Tuning for the Contextual Bandit
论文作者
论文摘要
我们在这里研究了在线性奖励功能设置的上下文匪徒问题中学习探索剥削权衡的问题。在解决上下文匪徒问题的传统算法中,探索是用户调整的参数。但是,我们提出的算法学会根据观察到的上下文以在线方式选择正确的探索参数,并根据所选动作获得的即时奖励。我们在这里介绍了两种算法,这些算法使用匪徒找到了上下文匪徒算法的最佳探索,我们希望这是迈向多军匪徒自动化的第一步。
We study here the problem of learning the exploration exploitation trade-off in the contextual bandit problem with linear reward function setting. In the traditional algorithms that solve the contextual bandit problem, the exploration is a parameter that is tuned by the user. However, our proposed algorithm learn to choose the right exploration parameters in an online manner based on the observed context, and the immediate reward received for the chosen action. We have presented here two algorithms that uses a bandit to find the optimal exploration of the contextual bandit algorithm, which we hope is the first step toward the automation of the multi-armed bandit algorithm.