论文标题

两阶段推荐系统的探索

Exploration in two-stage recommender systems

论文作者

Hron, Jiri, Krauth, Karl, Jordan, Michael I., Kilbertus, Niki

论文摘要

由于其可伸缩性和可维护性,两阶段的推荐系统在行业中被广泛采用。这些系统分为两个步骤提出建议:(i)多个提名人使用便宜的物品嵌入中将少数物品从大型游泳池中进行预选; (ii)具有更丰富的功能,排名者将提名的项目重新安排并将其服务给用户。这种设置的一个关键挑战是,每个阶段的最佳性能并不意味着最佳的全球性能。为了回应这个问题,Ma等人。 (2020年)提出了一个提名培训的目标重​​要性,该重要性是由排名者推荐每个项目的可能性加权的。在这项工作中,我们专注于探索的互补问题。以上下文的强盗问题为模型,我们发现Linucb(单阶段系统的几乎最佳探索策略)可能会导致在两阶段推荐人部署时线性遗憾。因此,我们提出了一种同步排名者和提名人之间的探索策略的方法。我们的算法仅依赖于每个阶段标准Linucb已经计算的数量,并且可以用三行附加代码实现。我们结束时通过实验证明算法的有效性。

Two-stage recommender systems are widely adopted in industry due to their scalability and maintainability. These systems produce recommendations in two steps: (i) multiple nominators preselect a small number of items from a large pool using cheap-to-compute item embeddings; (ii) with a richer set of features, a ranker rearranges the nominated items and serves them to the user. A key challenge of this setup is that optimal performance of each stage in isolation does not imply optimal global performance. In response to this issue, Ma et al. (2020) proposed a nominator training objective importance weighted by the ranker's probability of recommending each item. In this work, we focus on the complementary issue of exploration. Modeled as a contextual bandit problem, we find LinUCB (a near optimal exploration strategy for single-stage systems) may lead to linear regret when deployed in two-stage recommenders. We therefore propose a method of synchronising the exploration strategies between the ranker and the nominators. Our algorithm only relies on quantities already computed by standard LinUCB at each stage and can be implemented in three lines of additional code. We end by demonstrating the effectiveness of our algorithm experimentally.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源