数据推出的可区分和可扩展的生成对抗模型

论文标题

数据推出的可区分和可扩展的生成对抗模型

Differentiable and Scalable Generative Adversarial Models for Data Imputation

论文作者

Wu, Yangyang, Wang, Jun, Miao, Xiaoye, Wang, Wenjia, Yin, Jianwei

论文摘要

广泛探索了数据插补以解决丢失的数据问题。不完整的数据的数量急剧增加，使插定模型在许多现实生活中都不可行。在本文中，我们提出了一个名为SCI的有效可扩展的插补系统，以显着加快对大规模不完整数据的准确保证，在精确保证下对可区分的生成对手插款模型进行了加快。 SCI由两个模块组成，可区分的插定建模（DIM）和样本量估计（SSE）。 DIM利用了新的掩蔽式凹凸差异功能，以使任意生成的对抗性插补模型可区分，而对于这种可区分的插补模型，SSE可以估算适当的样本大小，以确保最终模型的用户指定的插定精度。对几个现实生活中的大规模数据集进行了广泛的实验表明，我们提出的系统可以通过7.1倍加速生成的对抗模型训练。 SCI使用大约7.6％的样本在更短的计算时间内使用最先进的插补方法产生竞争精度。

Data imputation has been extensively explored to solve the missing data problem. The dramatically increasing volume of incomplete data makes the imputation models computationally infeasible in many real-life applications. In this paper, we propose an effective scalable imputation system named SCIS to significantly speed up the training of the differentiable generative adversarial imputation models under accuracy-guarantees for large-scale incomplete data. SCIS consists of two modules, differentiable imputation modeling (DIM) and sample size estimation (SSE). DIM leverages a new masking Sinkhorn divergence function to make an arbitrary generative adversarial imputation model differentiable, while for such a differentiable imputation model, SSE can estimate an appropriate sample size to ensure the user-specified imputation accuracy of the final model. Extensive experiments upon several real-life large-scale datasets demonstrate that, our proposed system can accelerate the generative adversarial model training by 7.1x. Using around 7.6% samples, SCIS yields competitive accuracy with the state-of-the-art imputation methods in a much shorter computation time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题