论文标题

DP-MERF:具有差异性私人含义嵌入,具有随机特征用于实用隐私数据生成

DP-MERF: Differentially Private Mean Embeddings with Random Features for Practical Privacy-Preserving Data Generation

论文作者

Harder, Frederik, Adamczewski, Kamil, Park, Mijung

论文摘要

我们在比较真实数据的分布与合成数据的分布时,使用内核平均嵌入式的随机特征表示范围提出了差异性数据生成范例。我们利用随机特征表示为两个重要的好处。首先,我们需要最小的隐私成本来培训深层生成模型。这是因为与需要在所有成对和合成数据点上计算内核矩阵的基于内核的距离指标不同,我们可以从术语中分离出数据依赖性项,仅取决于综合数据。因此,我们只需要一次扰动数据依赖性项,然后在发电机培训期间重复使用它。其次,我们可以获得内核平均嵌入的分析灵敏度,因为随机特征是由构造界定的。这消除了超级参数搜索剪辑规范的必要性,以处理发电机网络的未知灵敏度。我们提供具有随机特征(DP-MERF)的算法的几种变体,以差异性私有化的平均嵌入方式共同生成数据集的标签和输入功能,例如异质性表格数据和图像数据。在几个数据集中测试时,我们的算法比现有方法实现了比现有方法更好的私密性权衡。

We propose a differentially private data generation paradigm using random feature representations of kernel mean embeddings when comparing the distribution of true data with that of synthetic data. We exploit the random feature representations for two important benefits. First, we require a minimal privacy cost for training deep generative models. This is because unlike kernel-based distance metrics that require computing the kernel matrix on all pairs of true and synthetic data points, we can detach the data-dependent term from the term solely dependent on synthetic data. Hence, we need to perturb the data-dependent term only once and then use it repeatedly during the generator training. Second, we can obtain an analytic sensitivity of the kernel mean embedding as the random features are norm bounded by construction. This removes the necessity of hyper-parameter search for a clipping norm to handle the unknown sensitivity of a generator network. We provide several variants of our algorithm, differentially-private mean embeddings with random features (DP-MERF) to jointly generate labels and input features for datasets such as heterogeneous tabular data and image data. Our algorithm achieves drastically better privacy-utility trade-offs than existing methods when tested on several datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源