用未观察到的数据估算关系网络中的汇总属性

论文标题

用未观察到的数据估算关系网络中的汇总属性

Estimating Aggregate Properties In Relational Networks With Unobserved Data

论文作者

Embar, Varun, Srinivasan, Sriram, Getoor, Lise

论文摘要

聚集网络属性（例如群集凝聚力和桥接节点的数量）可用于了解网络社区结构，影响的传播以及网络对故障的弹性的见解。当完全观察到网络时，有效地计算网络属性已引起了很大的关注（Wasserman and Faust 1994； Cook and Holder 2006），但是当缺少数据属性时，计算总体网络属性的问题很少受到关注。为缺少属性的网络计算这些属性涉及对网络进行推断。统计关系学习（SRL）和图神经网络（GNN）是两类机器学习方法，非常适合推断图中缺少属性。在本文中，我们研究了这些方法在估计缺少属性网络上的汇总特性方面的有效性。我们比较两种SRL方法和三个GNN。对于这些方法，我们使用点估计值（例如MAP和平均值）估算了这些属性。对于基于SRL的方法，可以推断丢失属性的联合分布，我们还将这些属性视为对分布的期望。为了对概率软逻辑进行易于计算的期望，我们研究的SRL方法之一，我们引入了一种新型的采样框架。在实验评估中，使用三个基准数据集，我们表明基于SRL的方法倾向于在计算骨料属性和预测精度方面超过基于GNN的方法。具体而言，我们表明，估计汇总性质作为对关节分布的期望优于估计值。

Aggregate network properties such as cluster cohesion and the number of bridge nodes can be used to glean insights about a network's community structure, spread of influence and the resilience of the network to faults. Efficiently computing network properties when the network is fully observed has received significant attention (Wasserman and Faust 1994; Cook and Holder 2006), however the problem of computing aggregate network properties when there is missing data attributes has received little attention. Computing these properties for networks with missing attributes involves performing inference over the network. Statistical relational learning (SRL) and graph neural networks (GNNs) are two classes of machine learning approaches well suited for inferring missing attributes in a graph. In this paper, we study the effectiveness of these approaches in estimating aggregate properties on networks with missing attributes. We compare two SRL approaches and three GNNs. For these approaches we estimate these properties using point estimates such as MAP and mean. For SRL-based approaches that can infer a joint distribution over the missing attributes, we also estimate these properties as an expectation over the distribution. To compute the expectation tractably for probabilistic soft logic, one of the SRL approaches that we study, we introduce a novel sampling framework. In the experimental evaluation, using three benchmark datasets, we show that SRL-based approaches tend to outperform GNN-based approaches both in computing aggregate properties and predictive accuracy. Specifically, we show that estimating the aggregate properties as an expectation over the joint distribution outperforms point estimates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题