论文标题
部分匪徒和半狂人:充分利用稀缺用户的反馈
Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback
论文作者
论文摘要
最近关于多武器匪徒(MAB)和组合多臂强盗(COM-MAB)的作品在全球精度度量上显示出良好的效果。在推荐系统的情况下,可以通过个性化实现这一目标。但是,通过组合在线学习方法,个性化意味着大量的用户反馈。当需要直接且经常征求用户时,可能很难获得此类反馈。对于许多正在进行业务数字化的活动领域,在线学习是不可避免的。因此,已经实施了许多允许隐式用户反馈检索的方法。然而,这种隐式反馈对代理商的学习可能是误导性的或效率低下的。本文中,我们提出了一种新颖的方法,以减少组合多武装强盗(COM-MAB)算法所需的明确反馈数量,同时为经典竞争方法提供了相似的全球准确性和学习效率。在本文中,我们提出了一种新的方法,用于考虑用户反馈并使用三种不同的策略对其进行评估。尽管用户返回的反馈数量有限(低至总计的20%),但我们的方法与最新方法的结果相似。
Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed Bandits (COM-MAB) show good results on a global accuracy metric. This can be achieved, in the case of recommender systems, with personalization. However, with a combinatorial online learning approach, personalization implies a large amount of user feedbacks. Such feedbacks can be hard to acquire when users need to be directly and frequently solicited. For a number of fields of activities undergoing the digitization of their business, online learning is unavoidable. Thus, a number of approaches allowing implicit user feedback retrieval have been implemented. Nevertheless, this implicit feedback can be misleading or inefficient for the agent's learning. Herein, we propose a novel approach reducing the number of explicit feedbacks required by Combinatorial Multi Armed bandit (COM-MAB) algorithms while providing similar levels of global accuracy and learning efficiency to classical competitive methods. In this paper we present a novel approach for considering user feedback and evaluate it using three distinct strategies. Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.