论文标题
图形:图形分析使用差分计算在查看集合上
Graphsurge: Graph Analytics on View Collections Using Differential Computation
论文作者
论文摘要
本文介绍了一个名为GraphSurge的新开源图分析系统的设计和实现。 GraphSurge旨在支持分析大型图形的多个快照或视图的应用程序。用户通过声明图视图定义语言(GVDL)进行程序图表,以创建输入图和基于数据流程的编程API的视图以编写分析计算。 GVDL的一个关键功能是将视图组织到视图集合中的能力,该视图可以通过差异性执行计算来自动在视图上自动共享计算,而无需用户编写任何增量代码。然后,我们引入了两个在我们的环境中自然出现的优化问题。首先是收集排序问题,以确定导致连续视图之间最小差异的视图顺序。我们证明了这个问题是NP硬化,并显示了从文献中绘制的恒定因子近似算法。其次是集合分裂问题,以决定从头开始运行哪些视图与从头开始差异化的计算,我们为此提供了一种自适应解决方案,该解决方案在运行时做出决策。我们提出了广泛的实验,以证明对视图收集以及我们的收集订购和分裂优化的运行计算的好处。
This paper presents the design and implementation of a new open-source view-based graph analytics system called Graphsurge. Graphsurge is designed to support applications that analyze multiple snapshots or views of a large-scale graph. Users program Graphsurge through a declarative graph view definition language (GVDL) to create views over input graphs and a Differential Dataflow-based programming API to write analytics computations. A key feature of GVDL is the ability to organize views into view collections, which allows Graphsurge to automatically share computation across views, without users writing any incrementalization code, by performing computations differentially. We then introduce two optimization problems that naturally arise in our setting. First is the collection ordering problem to determine the order of views that leads to minimum differences across consecutive views. We prove this problem is NP-hard and show a constant-factor approximation algorithm drawn from literature. Second is the collection splitting problem to decide on which views to run computations differentially vs from scratch, for which we present an adaptive solution that makes decisions at runtime. We present extensive experiments to demonstrate the benefits of running computations differentially for view collections and our collection ordering and splitting optimizations.