开放匪徒数据集和管道：迈向现实且可重复的非政策评估

论文标题

开放匪徒数据集和管道：迈向现实且可重复的非政策评估

Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation

论文作者

Saito, Yuta, Aihara, Shunsuke, Matsutani, Megumi, Narita, Yusuke

论文摘要

非政策评估（OPE）旨在使用不同策略产生的数据估算假设政策的性能。由于其在实践中的潜在影响巨大，因此对该领域的研究兴趣越来越大。但是，没有现实世界中的公共数据集可以评估OPE，从而使其实验研究变得不现实且不可重复。为了实现现实且可重复的OPE研究，我们提出了开放匪徒数据集，该数据集是一个在大规模时尚的电子商务平台Zozotown上收集的公共登录强盗数据集。我们的数据集是唯一的，因为它包含一组通过在同一平台上运行不同策略收集的多个记录的强盗数据集。这首先可以对不同的OPE估计器进行实验比较。我们还开发了名为“开放匪徒”管道的Python软件，以简化和标准化批量土匪算法和OPE的实现。我们的开放数据和软件将有助于公平，透明的OPE研究，并帮助社区确定富有成果的研究方向。我们使用数据集和软件提供了现有OPE估计器的广泛基准实验。结果为未来的OPE研究开辟了基本挑战和新途径。

Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using data generated by a different policy. Because of its huge potential impact in practice, there has been growing research interest in this field. There is, however, no real-world public dataset that enables the evaluation of OPE, making its experimental studies unrealistic and irreproducible. With the goal of enabling realistic and reproducible OPE research, we present Open Bandit Dataset, a public logged bandit dataset collected on a large-scale fashion e-commerce platform, ZOZOTOWN. Our dataset is unique in that it contains a set of multiple logged bandit datasets collected by running different policies on the same platform. This enables experimental comparisons of different OPE estimators for the first time. We also develop Python software called Open Bandit Pipeline to streamline and standardize the implementation of batch bandit algorithms and OPE. Our open data and software will contribute to fair and transparent OPE research and help the community identify fruitful research directions. We provide extensive benchmark experiments of existing OPE estimators using our dataset and software. The results open up essential challenges and new avenues for future OPE research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题