通过联合学习来保存分布式机器学习的隐私

论文标题

通过联合学习来保存分布式机器学习的隐私

Privacy Preserving Distributed Machine Learning with Federated Learning

论文作者

Chamikara, M. A. P., Bertok, P., Khalil, I., Liu, D., Camtepe, S.

论文摘要

边缘计算和分布式机器学习已提高到可以彻底改变特定组织的水平。分布式设备（例如物联网）通常会产生大量数据，最终导致大数据对于发现隐藏模式至关重要，以及在医疗保健，银行业和警务等众多领域中的其他见解。与医疗保健和银行业务等领域有关的数据可能包含潜在的敏感数据，如果不适当消毒，它们可能会公开。联合学习（FEDML）是一种最近开发的分布式机器学习（DML）方法，试图通过将ML模型的学习带给数据所有者来保护隐私。但是，文献显示出不同的攻击方法，例如会成员资格推理，这些推理利用了ML模型的漏洞以及协调服务器来检索私人数据。因此，FEDML需要采取其他措施来保证数据隐私。此外，大数据通常需要比标准计算机中可用的更多资源。本文通过提出一种称为DistPab的分布式扰动算法来解决这些问题，以保存水平分区的数据。 DistPab利用分布式环境的资源不对称来分发隐私保护任务，从而减轻计算瓶颈，这些任务可以具有资源约束的设备以及高性能计算机。实验表明，DistPAB具有高精度，高效率，高可扩展性和高攻击性。关于隐私性FEDML的进一步实验表明，DistPab是阻止DML隐私泄漏的绝佳解决方案，同时保留高数据实用程序。

Edge computing and distributed machine learning have advanced to a level that can revolutionize a particular organization. Distributed devices such as the Internet of Things (IoT) often produce a large amount of data, eventually resulting in big data that can be vital in uncovering hidden patterns, and other insights in numerous fields such as healthcare, banking, and policing. Data related to areas such as healthcare and banking can contain potentially sensitive data that can become public if they are not appropriately sanitized. Federated learning (FedML) is a recently developed distributed machine learning (DML) approach that tries to preserve privacy by bringing the learning of an ML model to data owners'. However, literature shows different attack methods such as membership inference that exploit the vulnerabilities of ML models as well as the coordinating servers to retrieve private data. Hence, FedML needs additional measures to guarantee data privacy. Furthermore, big data often requires more resources than available in a standard computer. This paper addresses these issues by proposing a distributed perturbation algorithm named as DISTPAB, for privacy preservation of horizontally partitioned data. DISTPAB alleviates computational bottlenecks by distributing the task of privacy preservation utilizing the asymmetry of resources of a distributed environment, which can have resource-constrained devices as well as high-performance computers. Experiments show that DISTPAB provides high accuracy, high efficiency, high scalability, and high attack resistance. Further experiments on privacy-preserving FedML show that DISTPAB is an excellent solution to stop privacy leaks in DML while preserving high data utility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题