部分匪徒和半狂人：充分利用稀缺用户的反馈

论文标题

部分匪徒和半狂人：充分利用稀缺用户的反馈

Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback

论文作者

Letard, Alexandre, Amghar, Tassadit, Camp, Olivier, Gutowski, Nicolas

论文摘要

最近关于多武器匪徒（MAB）和组合多臂强盗（COM-MAB）的作品在全球精度度量上显示出良好的效果。在推荐系统的情况下，可以通过个性化实现这一目标。但是，通过组合在线学习方法，个性化意味着大量的用户反馈。当需要直接且经常征求用户时，可能很难获得此类反馈。对于许多正在进行业务数字化的活动领域，在线学习是不可避免的。因此，已经实施了许多允许隐式用户反馈检索的方法。然而，这种隐式反馈对代理商的学习可能是误导性的或效率低下的。本文中，我们提出了一种新颖的方法，以减少组合多武装强盗（COM-MAB）算法所需的明确反馈数量，同时为经典竞争方法提供了相似的全球准确性和学习效率。在本文中，我们提出了一种新的方法，用于考虑用户反馈并使用三种不同的策略对其进行评估。尽管用户返回的反馈数量有限（低至总计的20％），但我们的方法与最新方法的结果相似。

Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed Bandits (COM-MAB) show good results on a global accuracy metric. This can be achieved, in the case of recommender systems, with personalization. However, with a combinatorial online learning approach, personalization implies a large amount of user feedbacks. Such feedbacks can be hard to acquire when users need to be directly and frequently solicited. For a number of fields of activities undergoing the digitization of their business, online learning is unavoidable. Thus, a number of approaches allowing implicit user feedback retrieval have been implemented. Nevertheless, this implicit feedback can be misleading or inefficient for the agent's learning. Herein, we propose a novel approach reducing the number of explicit feedbacks required by Combinatorial Multi Armed bandit (COM-MAB) algorithms while providing similar levels of global accuracy and learning efficiency to classical competitive methods. In this paper we present a novel approach for considering user feedback and evaluate it using three distinct strategies. Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题