论文标题

大图上的精确单源Simrank计算

Exact Single-Source SimRank Computation on Large Graphs

论文作者

Wang, Hanzhi, Wei, Zhewei, Yuan, Ye, Du, Xiaoyong, Wen, Ji-Rong

论文摘要

Simrank是一个流行的测量方法,用于根据图形拓扑评估节点到节点的相似性。近年来,由于其在网络挖掘,社交网络分析和垃圾邮件检测中的应用,单源和顶级$ $ simrank查询受到了越来越多的关注。但是,研究Simrank的根本障碍是缺乏基础真理。在$ 10^6 $节点的图表上,唯一的精确算法是功率方法,在计算上是不可行的。因此,没有现有的工作评估了大型现实图表上查询时间和准确性之间的实际权衡。在本文中,我们介绍了Exactsim,这是第一个计算大图上计算确切的单源和顶部$ k $ simrank结果的算法。具有很高的可能性,该算法会产生基本真理,并具有严格的理论保证。我们在现实世界数据集上进行了广泛的实验,以证明精确的SIM效率。结果表明,ExactSIM为任何单源Simrank查询提供了基础真实,在合理的查询时间内,最多可精确使用7个小数位。

SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-$k$ SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than $10^6$ nodes. Consequently, no existing work has evaluated the actual trade-offs between query time and accuracy on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-$k$ SimRank results on large graphs. With high probability, this algorithm produces ground truths with a rigorous theoretical guarantee. We conduct extensive experiments on real-world datasets to demonstrate the efficiency of ExactSim. The results show that ExactSim provides the ground truth for any single-source SimRank query with a precision up to 7 decimal places within a reasonable query time.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源