论文标题
有信心的大规模比较数据排名
Ranking with Confidence for Large Scale Comparison Data
论文作者
论文摘要
在这项工作中,我们利用了一个考虑比较噪声的生成数据模型,从成对比较中开发出快速,精确且信息丰富的排名算法,从而在每个比较上产生了信心的量度。从嘈杂和稀疏成对比较数据中排名大量项目的问题出现在不同的应用程序中,例如在在线游戏中排名玩家,文档检索或对人类的看法进行排名。尽管可以使用不同的算法,但我们需要快速的大规模算法,当比较数量太小时,其精度会优雅地降低。拟合我们提出的模型需要解决非凸优化问题,我们通过准convex函数和正则化项的总和紧密地近似。诉诸迭代重新加权的最小化和原始的双重混合梯度方法,我们获得PD级,获得kendall tau比所有比较方法高0.1,即使在模拟数据中,即使数据模型中的模拟数据中的10个错误比较中的10 \%都可以准确地与Bradley-efers case-serve servers forter for Bradley-terfers一起产生,并且在准确的数据中,该序列是通过序列模型生成的。在实际数据中,与主动学习方法相比,PD级需要更少的计算时间才能获得相同的Kendall Tau。
In this work, we leverage a generative data model considering comparison noise to develop a fast, precise, and informative ranking algorithm from pairwise comparisons that produces a measure of confidence on each comparison. The problem of ranking a large number of items from noisy and sparse pairwise comparison data arises in diverse applications, like ranking players in online games, document retrieval or ranking human perceptions. Although different algorithms are available, we need fast, large-scale algorithms whose accuracy degrades gracefully when the number of comparisons is too small. Fitting our proposed model entails solving a non-convex optimization problem, which we tightly approximate by a sum of quasi-convex functions and a regularization term. Resorting to an iterative reweighted minimization and the Primal-Dual Hybrid Gradient method, we obtain PD-Rank, achieving a Kendall tau 0.1 higher than all comparing methods, even for 10\% of wrong comparisons in simulated data matching our data model, and leading in accuracy if data is generated according to the Bradley-Terry model, in both cases faster by one order of magnitude, in seconds. In real data, PD-Rank requires less computational time to achieve the same Kendall tau than active learning methods.