论文标题
ACDC:重量分解卷积中的重量共享
ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution
论文作者
论文摘要
已知卷积神经网络(CNN)被显着过多地塑造,并且难以解释,训练和适应。在本文中,我们在CNN中介绍了跨卷积内核的结构正则化。在我们的方法中,每个卷积内核首先被分解为由系数线性合并的2D字典原子。 CNN中广泛观察到的相关性和冗余暗示了分解系数中的共同低级结构,在这里,这进一步得到了我们的经验观察结果的支持。然后,我们通过强制执行分解系数以在子结构之间共享的分解系数来明确规范CNN内核,同时通常将每个副结构留下其自己的字典原子(通常几百个参数),这通常会导致巨大的模型降低。我们探索具有跨不同子结构共享的模型,以涵盖参数降低和表现力之间的广泛权衡。我们提出的正规网络结构为更好的解释,培训和调整深层模型打开了大门。我们通过在多个数据集和基础网络结构上进行图像分类实验来验证方法的灵活性和兼容性,并表明CNN现在保持性能,并且参数和计算的急剧降低,例如,在Resnet-18中仅使用5 \%参数来实现可比性的性能。几乎没有射击分类的进一步实验表明,与具有标准卷积的模型相比,获得了更快,更健壮的任务适应。
Convolutional Neural Networks (CNNs) are known to be significantly over-parametrized, and difficult to interpret, train and adapt. In this paper, we introduce a structural regularization across convolutional kernels in a CNN. In our approach, each convolution kernel is first decomposed as 2D dictionary atoms linearly combined by coefficients. The widely observed correlation and redundancy in a CNN hint a common low-rank structure among the decomposed coefficients, which is here further supported by our empirical observations. We then explicitly regularize CNN kernels by enforcing decomposed coefficients to be shared across sub-structures, while leaving each sub-structure only its own dictionary atoms, a few hundreds of parameters typically, which leads to dramatic model reductions. We explore models with sharing across different sub-structures to cover a wide range of trade-offs between parameter reduction and expressiveness. Our proposed regularized network structures open the door to better interpreting, training and adapting deep models. We validate the flexibility and compatibility of our method by image classification experiments on multiple datasets and underlying network structures, and show that CNNs now maintain performance with dramatic reduction in parameters and computations, e.g., only 5\% parameters are used in a ResNet-18 to achieve comparable performance. Further experiments on few-shot classification show that faster and more robust task adaptation is obtained in comparison with models with standard convolutions.