具有异质治疗效果的柔性和高效的上下文匪徒

论文标题

具有异质治疗效果的柔性和高效的上下文匪徒

Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracles

论文作者

Carranza, Aldo Gael, Krishnamurthy, Sanath Kumar, Athey, Susan

论文摘要

上下文匪徒算法通常会估算奖励模型，以告知决策。但是，真正的奖励可以包含与决策无关的独立裁员。我们表明，估计任何解释动作之间奖励差异（即治疗效果）的功能更为数据效率。在这一观察结果的推动下，基于最新的基于甲骨文的强盗算法的工作，我们将首次减少上下文匪徒从通用的异质治疗效果估计，并根据此减少设计了一种简单且计算上有效的算法。我们的理论和实验结果表明，上下文匪徒中的异质治疗效果估计比奖励估计具有实际的优势，包括更有效的模型估计和更大的模型错误指定的灵活性。

Contextual bandit algorithms often estimate reward models to inform decision-making. However, true rewards can contain action-independent redundancies that are not relevant for decision-making. We show it is more data-efficient to estimate any function that explains the reward differences between actions, that is, the treatment effects. Motivated by this observation, building on recent work on oracle-based bandit algorithms, we provide the first reduction of contextual bandits to general-purpose heterogeneous treatment effect estimation, and we design a simple and computationally efficient algorithm based on this reduction. Our theoretical and experimental results demonstrate that heterogeneous treatment effect estimation in contextual bandits offers practical advantages over reward estimation, including more efficient model estimation and greater flexibility to model misspecification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题