论文标题
对平行分布推理的深层模型进行重组,修剪和调整
Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference
论文作者
论文摘要
使用多个节点和并行计算算法已成为改善深度神经网络培训和执行时间以及传感器网络中有效的集体智能的主要工具。在本文中,我们考虑了在多个处理节点(又称工人)上已经训练的深层模型的并行实现,其中深层模型被分为几个平行的子模型,每个子模型都由工人执行。由于工人之间由于同步和数据传输而引起的延迟对并行实施的性能产生负面影响,因此希望在平行子模型之间具有最小的相互依赖性。为了实现这一目标,我们建议在神经网络中重新排列神经元并对其进行分区(不改变神经网络的一般拓扑),以便在工人的计算和通信限制下将子模型之间的相互依赖性最小化。我们提出了Repulpose,这是一种按层模型重组和修剪技术,可确保整体并行模型的性能。为了有效地应用重新利用,我们提出了一种基于$ \ ell_0 $优化和Munkres分配算法的方法。我们表明,与现有方法相比,在通信和计算复杂性方面,通过并行实现可显着提高分布式推理的效率。
Using multiple nodes and parallel computing algorithms has become a principal tool to improve training and execution times of deep neural networks as well as effective collective intelligence in sensor networks. In this paper, we consider the parallel implementation of an already-trained deep model on multiple processing nodes (a.k.a. workers) where the deep model is divided into several parallel sub-models, each of which is executed by a worker. Since latency due to synchronization and data transfer among workers negatively impacts the performance of the parallel implementation, it is desirable to have minimum interdependency among parallel sub-models. To achieve this goal, we propose to rearrange the neurons in the neural network and partition them (without changing the general topology of the neural network), such that the interdependency among sub-models is minimized under the computations and communications constraints of the workers. We propose RePurpose, a layer-wise model restructuring and pruning technique that guarantees the performance of the overall parallelized model. To efficiently apply RePurpose, we propose an approach based on $\ell_0$ optimization and the Munkres assignment algorithm. We show that, compared to the existing methods, RePurpose significantly improves the efficiency of the distributed inference via parallel implementation, both in terms of communication and computational complexity.