使用用户自定义生成私人数据

论文标题

使用用户自定义生成私人数据

Generating private data with user customization

论文作者

Chen, Xiao, Navidi, Thomas, Rajagopal, Ram

论文摘要

诸如手机之类的个人设备可以生产和存储大量可以增强机器学习模型的数据；但是，此数据可能包含针对数据所有者的私人信息，以防止数据发布。我们希望在保留有用信息的同时，减少特定于用户的私人信息和数据之间的相关性。我们首先将大型模型从头到尾实现私有化，而是首先将潜在表示的创建解除，然后将允许用户特定私有化的数据私有化，以有限的计算和对数据的实用性的最小扰动进行设置。我们利用变量自动编码器（VAE）创建一个紧凑的潜在表示数据，该图表固定在所有设备和所有可能的专用标签上。然后，我们训练一个小生成过滤器，以根据用户指定的私人和公用事业信息的偏好来扰动潜在表示。小型过滤器是通过GAN型强大优化训练的，该优化可以在分布式设备（例如手机或平板电脑）上进行。在线性过滤器的特殊条件下，我们披露了生成方法与Renyi差异隐私之间的联系。我们在包括MNIST，UCI-ADULT和CELEBA在内的多个数据集上进行实验，并进行了彻底的评估，包括可视化潜在嵌入的几何形状并估算经验互相关信息以显示我们方法的有效性。

Personal devices such as mobile phones can produce and store large amounts of data that can enhance machine learning models; however, this data may contain private information specific to the data owner that prevents the release of the data. We want to reduce the correlation between user-specific private information and the data while retaining the useful information. Rather than training a large model to achieve privatization from end to end, we first decouple the creation of a latent representation, and then privatize the data that allows user-specific privatization to occur in a setting with limited computation and minimal disturbance on the utility of the data. We leverage a Variational Autoencoder (VAE) to create a compact latent representation of the data that remains fixed for all devices and all possible private labels. We then train a small generative filter to perturb the latent representation based on user specified preferences regarding the private and utility information. The small filter is trained via a GAN-type robust optimization that can take place on a distributed device such as a phone or tablet. Under special conditions of our linear filter, we disclose the connections between our generative approach and renyi differential privacy. We conduct experiments on multiple datasets including MNIST, UCI-Adult, and CelebA, and give a thorough evaluation including visualizing the geometry of the latent embeddings and estimating the empirical mutual information to show the effectiveness of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题