因果土匪：内生环境中的在线决策

论文标题

因果土匪：内生环境中的在线决策

Causal Bandits: Online Decision-Making in Endogenous Settings

论文作者

Zhang, Jingwen, Chen, Yifang, Singh, Amandeep

论文摘要

在许多经济应用中，多军匪徒的部署已成为司空见惯。但是，遗憾保证即使是最先进的线性匪徒算法（例如面对不确定性的线性匪徒（OFUL）的乐观情绪（例如乐观），使强大的外生性假设W.R.T.手臂协变量。在许多经济背景下，这种假设经常被违反，使用这种算法可能会导致次优的决定。此外，在社会科学分析中，了解估计参数的渐近分布也很重要。为此，在本文中，我们考虑了内源性协变量中线性随机上下文匪徒问题中的在线学习问题。我们提出了一种算法，我们将$ε$ - banditiv术语使用，该算法使用仪器变量来纠正这种偏见，并证明了$ \ tilde {\ Mathcal {o}}}}（k \ sqrt {t}）$上限$ the uppers $ the bounts the Algorithm的预期遗憾。此外，我们证明了$ε$ banditiv估算器的渐近一致性和正态性。我们进行了广泛的蒙特卡洛模拟，以证明与其他方法相比，我们的算法的性能。我们表明，$ε$ banditiv在内源环境中的其他现有方法显着优于其他现有方法。最后，我们使用实时投标（RTB）系统的数据来说明如何使用$ε$ banditiv来估计广告在这种情况下的因果影响，并将其性能与其他现有方法进行比较。

The deployment of Multi-Armed Bandits (MAB) has become commonplace in many economic applications. However, regret guarantees for even state-of-the-art linear bandit algorithms (such as Optimism in the Face of Uncertainty Linear bandit (OFUL)) make strong exogeneity assumptions w.r.t. arm covariates. This assumption is very often violated in many economic contexts and using such algorithms can lead to sub-optimal decisions. Further, in social science analysis, it is also important to understand the asymptotic distribution of estimated parameters. To this end, in this paper, we consider the problem of online learning in linear stochastic contextual bandit problems with endogenous covariates. We propose an algorithm we term $ε$-BanditIV, that uses instrumental variables to correct for this bias, and prove an $\tilde{\mathcal{O}}(k\sqrt{T})$ upper bound for the expected regret of the algorithm. Further, we demonstrate the asymptotic consistency and normality of the $ε$-BanditIV estimator. We carry out extensive Monte Carlo simulations to demonstrate the performance of our algorithms compared to other methods. We show that $ε$-BanditIV significantly outperforms other existing methods in endogenous settings. Finally, we use data from real-time bidding (RTB) system to demonstrate how $ε$-BanditIV can be used to estimate the causal impact of advertising in such settings and compare its performance with other existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题