论文标题
通过图表学习处理缺失的数据
Handling Missing Data with Graph Representation Learning
论文作者
论文摘要
与缺少数据的机器学习通过两种不同的方式进行了处理,包括基于观察到的值估算缺失的特征值的特征插补,以及在不完整数据直接学习下游标签的标签预测。但是,现有的插补模型往往具有强大的先前假设,并且无法从下游任务中学习,而针对标签预测的模型通常涉及启发式方法,并且可能遇到可伸缩性问题。在这里,我们提出了葡萄,这是一个基于图形的框架,用于特征插补以及标签预测。葡萄使用图表表示缺少的数据问题,其中观察值和特征被视为两部分图中的两种类型的节点,而观察到的特征值作为边缘。在葡萄框架下,特征插补被公式为边缘级预测任务,标签预测作为节点级预测任务。然后使用图形神经网络解决这些任务。与现有的最新方法相比,九个基准数据集的实验结果表明,葡萄的平均绝对误差降低了20%,标签预测任务降低了10%。
Machine learning with missing data has been approached in two different ways, including feature imputation where missing feature values are estimated based on observed values, and label prediction where downstream labels are learned directly from incomplete data. However, existing imputation models tend to have strong prior assumptions and cannot learn from downstream tasks, while models targeting label prediction often involve heuristics and can encounter scalability issues. Here we propose GRAPE, a graph-based framework for feature imputation as well as label prediction. GRAPE tackles the missing data problem using a graph representation, where the observations and features are viewed as two types of nodes in a bipartite graph, and the observed feature values as edges. Under the GRAPE framework, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task. These tasks are then solved with Graph Neural Networks. Experimental results on nine benchmark datasets show that GRAPE yields 20% lower mean absolute error for imputation tasks and 10% lower for label prediction tasks, compared with existing state-of-the-art methods.