论文标题
在用户级隐私下进行直方图估算的限制贡献的算法
Algorithms for bounding contribution for histogram estimation under user-level privacy
论文作者
论文摘要
我们研究用户级差异隐私下的直方图估计问题,目标是保留任何单个用户的所有条目的隐私。我们考虑了每个用户的数据数量可能不同的异质方案。在这种情况下,注入直方图以获得差异隐私的噪声量与最大用户贡献成正比,这可能会被很少的异常值放大。绕过这一点的一种方法是绑定(或限制)每个用户对直方图的贡献。但是,如果用户仅限于少量贡献,则将丢弃大量数据。在这项工作中,我们建议算法选择在有限和无界域设置下绑定的最佳用户贡献,以进行直方图估算。当域的大小界限时,我们提出了一个用户贡献界限策略,该策略几乎相对于事后的最佳贡献实现了两种评价。对于无界的域直方图估计,我们提出了一种算法,该算法相对于在事后观看的最佳贡献而言是对数符号的算法。该结果在数据上没有任何分布假设。对真实和合成数据集的实验验证了我们的理论发现,并证明了我们算法的有效性。我们还表明,在轻度分配假设下,通过界限用户贡献引入的剪接偏差可能会降低,这可能具有独立的兴趣。
We study the problem of histogram estimation under user-level differential privacy, where the goal is to preserve the privacy of all entries of any single user. We consider the heterogeneous scenario where the quantity of data can be different for each user. In this scenario, the amount of noise injected into the histogram to obtain differential privacy is proportional to the maximum user contribution, which can be amplified by few outliers. One approach to circumvent this would be to bound (or limit) the contribution of each user to the histogram. However, if users are limited to small contributions, a significant amount of data will be discarded. In this work, we propose algorithms to choose the best user contribution bound for histogram estimation under both bounded and unbounded domain settings. When the size of the domain is bounded, we propose a user contribution bounding strategy that almost achieves a two-approximation with respect to the best contribution bound in hindsight. For unbounded domain histogram estimation, we propose an algorithm that is logarithmic-approximation with respect to the best contribution bound in hindsight. This result holds without any distribution assumptions on the data. Experiments on both real and synthetic datasets verify our theoretical findings and demonstrate the effectiveness of our algorithms. We also show that clipping bias introduced by bounding user contribution may be reduced under mild distribution assumptions, which can be of independent interest.