互动学习，定价在市场上进行最佳和稳定分配

论文标题

互动学习，定价在市场上进行最佳和稳定分配

Interactive Learning with Pricing for Optimal and Stable Allocations in Markets

论文作者

Erginbas, Yigit Efe, Phade, Soham, Ramchandran, Kannan

论文摘要

大规模的在线推荐系统必须促进竞争用户中有限数量的项目分配，同时从用户反馈中学习偏好。作为将市场限制和用户激励措施纳入设计的原则方法，我们认为我们的目标是两倍：最大的社会福利，而不稳定最少。为了最大程度地提高社会福利，我们提出的框架通过探索乐观地最大化奖励的分配来增强建议的质量。为了最大程度地减少不稳定性，用户激励措施偏离了建议的分配，算法是根据源自沃尔拉斯（Walrasian）均衡的计划的项目的价格。尽管众所周知，这些平衡为具有已知用户偏好的市场产生稳定的价格，但我们的方法是偏好的固有不确定性，并进一步确保用户接受其所提供的价格。据我们所知，我们的方法是第一个从组合匪徒，最佳资源分配和协作过滤中整合技术的方法，以获取一种算法，从而实现了次线性社会福利的遗憾以及亚线性不稳定。关于合成和现实世界数据的实证研究也证明了我们策略的功效与未完全融合所有这些方面的方法相比。

Large-scale online recommendation systems must facilitate the allocation of a limited number of items among competing users while learning their preferences from user feedback. As a principled way of incorporating market constraints and user incentives in the design, we consider our objectives to be two-fold: maximal social welfare with minimal instability. To maximize social welfare, our proposed framework enhances the quality of recommendations by exploring allocations that optimistically maximize the rewards. To minimize instability, a measure of users' incentives to deviate from recommended allocations, the algorithm prices the items based on a scheme derived from the Walrasian equilibria. Though it is known that these equilibria yield stable prices for markets with known user preferences, our approach accounts for the inherent uncertainty in the preferences and further ensures that the users accept their recommendations under offered prices. To the best of our knowledge, our approach is the first to integrate techniques from combinatorial bandits, optimal resource allocation, and collaborative filtering to obtain an algorithm that achieves sub-linear social welfare regret as well as sub-linear instability. Empirical studies on synthetic and real-world data also demonstrate the efficacy of our strategy compared to approaches that do not fully incorporate all these aspects.

下载PDF全文

下载文献需遵守相关版权规定

论文标题