论文标题
OPTG:优化网络稀疏性梯度驱动的标准
OptG: Optimizing Gradient-driven Criteria in Network Sparsity
论文作者
论文摘要
网络的稀疏性主要是由于其能力降低网络复杂性的能力。广泛的研究发掘了梯度驱动的稀疏性。通常,这些方法是在体重独立性前提下构建的,但是与重量相互影响的事实相反。因此,他们的性能仍有待改进。在本文中,我们建议通过解决此独立悖论来优化梯度驱动的稀疏性(OPTG)。我们的动机来自Supermask培训的最新进展,该进步表明,可以通过简单地更新蒙版值而无需修改任何重量来找到高性能的稀疏子网。我们证明,超级手机训练是为了累积以梯度驱动的稀疏性的标准,用于去除和保留的权重,并且可以部分解决独立悖论。因此,OPTG将Supermask训练集成到梯度驱动的稀疏度中,并进一步提出了一种新型的SuperMask优化器,以全面减轻独立性悖论。实验表明,OPTG可以很好地超过许多现有的最先进的竞争对手,尤其是在超高的稀疏度下。我们的代码可在\ url {https://github.com/zyxxmu/optg}上找到。
Network sparsity receives popularity mostly due to its capability to reduce the network complexity. Extensive studies excavate gradient-driven sparsity. Typically, these methods are constructed upon premise of weight independence, which however, is contrary to the fact that weights are mutually influenced. Thus, their performance remains to be improved. In this paper, we propose to optimize gradient-driven sparsity (OptG) by solving this independence paradox. Our motive comes from the recent advances in supermask training which shows that high-performing sparse subnetworks can be located by simply updating mask values without modifying any weight. We prove that supermask training is to accumulate the criteria of gradient-driven sparsity for both removed and preserved weights, and it can partly solve the independence paradox. Consequently, OptG integrates supermask training into gradient-driven sparsity, and a novel supermask optimizer is further proposed to comprehensively mitigate the independence paradox. Experiments show that OptG can well surpass many existing state-of-the-art competitors, especially at ultra-high sparsity levels. Our code is available at \url{https://github.com/zyxxmu/OptG}.