论文标题

分布式动态安全筛选算法稀疏正规化

Distributed Dynamic Safe Screening Algorithms for Sparse Regularization

论文作者

Bao, Runxue, Wu, Xidong, Xian, Wenhan, Huang, Heng

论文摘要

分布式优化已被广泛用作大量样品模型训练的最有效方法之一。但是,在大数据时代,大规模样本和高维特征都存在大规模的学习问题。安全筛选是一种流行的技术,可以通过丢弃具有零系数的非活动功能来加快高维模型的技术。但是,现有的安全筛选方法仅限于顺序设置。在本文中,我们提出了一种新的分布式动态安全筛选(DDSS),用于稀疏正规化模型,并分别将其应用于共享内存和分布式内存架构,可以通过同时享受模型和数据集的稀疏性来实现大量速度而不会损失任何准确性。据我们所知,这是分布式安全动态筛选方法的第一部作品。从理论上讲,我们证明所提出的方法以较低的整体复杂性达到线性收敛速率,并且几乎可以肯定地消除了几乎所有不活动的特征。最后,基准数据集的广泛实验结果证实了我们提出的方法的优越性。

Distributed optimization has been widely used as one of the most efficient approaches for model training with massive samples. However, large-scale learning problems with both massive samples and high-dimensional features widely exist in the era of big data. Safe screening is a popular technique to speed up high-dimensional models by discarding the inactive features with zero coefficients. Nevertheless, existing safe screening methods are limited to the sequential setting. In this paper, we propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively, which can achieve significant speedup without any loss of accuracy by simultaneously enjoying the sparsity of the model and dataset. To the best of our knowledge, this is the first work of distributed safe dynamic screening method. Theoretically, we prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely. Finally, extensive experimental results on benchmark datasets confirm the superiority of our proposed method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源