用多个手臂加快双面土匪的冷启动学习

论文标题

用多个手臂加快双面土匪的冷启动学习

Speed Up the Cold-Start Learning in Two-Sided Bandits with Many Arms

论文作者

Bayati, Mohsen, Cao, Junyu, Chen, Wanning

论文摘要

多臂强盗（MAB）算法是降低在线实验的机会成本的有效方法，并被公司使用来从定期刷新的产品目录中找到最佳产品。但是，由于缺乏对新产品的客户偏好的了解，这些算法在实验开始时面临所谓的冷启动，需要初始数据收集阶段称为“燃烧期”。在此期间，标准mAB算法像随机实验一样运行，产生了大量燃烧成本，并随着大量产品而扩展。我们试图通过确定可以将许多产品施加到双面产品中，然后用矩阵自然建模产品的奖励，从而减少燃烧，其行和列分别代表双方。接下来，我们设计了两相匪徒算法，该算法首先使用子采样和低级别矩阵估计来获取基本较小的靶向产品集，然后在目标产品上应用UCB程序以找到最佳的产品。从理论上讲，我们表明所提出的算法降低了成本，并在实验时间有限以及大型产品集的情况下加快了实验。我们的分析还揭示了三个长，短和超短的地平线实验的机制，具体取决于矩阵的尺寸。合成数据和音乐流服务上的现实世界数据集的经验证据验证了这一卓越的表现。

Multi-armed bandit (MAB) algorithms are efficient approaches to reduce the opportunity cost of online experimentation and are used by companies to find the best product from periodically refreshed product catalogs. However, these algorithms face the so-called cold-start at the onset of the experiment due to a lack of knowledge of customer preferences for new products, requiring an initial data collection phase known as the burn-in period. During this period, standard MAB algorithms operate like randomized experiments, incurring large burn-in costs which scale with the large number of products. We attempt to reduce the burn-in by identifying that many products can be cast into two-sided products, and then naturally model the rewards of the products with a matrix, whose rows and columns represent the two sides respectively. Next, we design two-phase bandit algorithms that first use subsampling and low-rank matrix estimation to obtain a substantially smaller targeted set of products and then apply a UCB procedure on the target products to find the best one. We theoretically show that the proposed algorithms lower costs and expedite the experiment in cases when there is limited experimentation time along with a large product set. Our analysis also reveals three regimes of long, short, and ultra-short horizon experiments, depending on dimensions of the matrix. Empirical evidence from both synthetic data and a real-world dataset on music streaming services validates this superior performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题