论文标题
用于加速神经网络处理器架构上的批处理算法的最佳调度架构
An optimal scheduling architecture for accelerating batch algorithms on Neural Network processor architectures
论文作者
论文摘要
在神经网络拓扑中,算法在数据张量的批次上运行。数据批次通常安排到并行执行的计算芯上。对于在数据批处理上运行的算法,通过适当利用硬件资源非常需要进行最佳的批处理调度架构 - 从而导致大量减少培训和推理时间。在本文中,我们建议通过调度架构加速神经网络的批处理算法,从而实现最佳的计算功能利用。提出的最佳调度架构可以内置在HW中,也可以仅在SW中实现,可以利用该架构来加速批处理算法。结果表明,与先前的解决方案相比,所提出的架构加快了批处理算法的速度。拟议的想法适用于用于神经网络的任何HPC体系结构。
In neural network topologies, algorithms are running on batches of data tensors. The batches of data are typically scheduled onto the computing cores which execute in parallel. For the algorithms running on batches of data, an optimal batch scheduling architecture is very much needed by suitably utilizing hardware resources - thereby resulting in significant reduction training and inference time. In this paper, we propose to accelerate the batch algorithms for neural networks through a scheduling architecture enabling optimal compute power utilization. The proposed optimal scheduling architecture can be built into HW or can be implemented in SW alone which can be leveraged for accelerating batch algorithms. The results demonstrate that the proposed architecture speeds up the batch algorithms compared to the previous solutions. The proposed idea applies to any HPC architecture meant for neural networks.