用于垂直联合学习的大规模安全XGB

论文标题

用于垂直联合学习的大规模安全XGB

Large-Scale Secure XGB for Vertical Federated Learning

论文作者

Fang, Wenjing, Zhao, Derun, Tan, Jin, Chen, Chaochao, Yu, Chaofan, Wang, Li, Wang, Lei, Zhou, Jun, Zhang, Benyu

论文摘要

隐私的机器学习最近引起了越来越多的关注，尤其是随着多种隐私法规的生效。在这种情况下，联邦学习（FL）似乎有助于多个政党之间的隐私联合建模。尽管已经对许多联合算法进行了广泛的研究，但在文献中仍然缺乏安全且实用的梯度增压模型（例如XGB）。在本文中，我们旨在在垂直联合的学习环境下建立大规模的安全XGB。我们保证来自三个方面的数据隐私。具体而言，（i）我们采用安全的多方计算技术来避免在培训期间泄漏中间信息，（ii）我们以分布式方式存储输出模型，以最大程度地释放信息，并且（iii）我们为使用分布式模型提供了一种安全XGB预测的新算法。此外，通过提出安全的排列协议，我们可以提高训练效率并将框架规模缩放到大型数据集。我们在公共数据集和实际数据集上进行了广泛的实验，结果表明，我们提出的XGB模型不仅提供了竞争精度，而且还提供了实际的性能。

Privacy-preserving machine learning has drawn increasingly attention recently, especially with kinds of privacy regulations come into force. Under such situation, Federated Learning (FL) appears to facilitate privacy-preserving joint modeling among multiple parties. Although many federated algorithms have been extensively studied, there is still a lack of secure and practical gradient tree boosting models (e.g., XGB) in literature. In this paper, we aim to build large-scale secure XGB under vertically federated learning setting. We guarantee data privacy from three aspects. Specifically, (i) we employ secure multi-party computation techniques to avoid leaking intermediate information during training, (ii) we store the output model in a distributed manner in order to minimize information release, and (iii) we provide a novel algorithm for secure XGB predict with the distributed model. Furthermore, by proposing secure permutation protocols, we can improve the training efficiency and make the framework scale to large dataset. We conduct extensive experiments on both public datasets and real-world datasets, and the results demonstrate that our proposed XGB models provide not only competitive accuracy but also practical performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题