论文标题
优化深度学习加速器的内存访问模式
Optimizing Memory-Access Patterns for Deep Learning Accelerators
论文作者
论文摘要
深度学习(DL)的工作负载正在朝着加速器迈进,以进行更快的处理和更低的成本。现代DL加速器擅长处理主导DL工作负载的大规模多功能操作。但是,充分利用加速器的计算功率是一项挑战,因为数据必须在软件管理的ScratchPad内存中正确上演。如果不这样做会导致巨大的绩效损失。本文提出了一种系统的方法,该方法利用多面体模型一起分析DL模型的所有操作员,以最大程度地减少内存访问的数量。实验表明,我们的方法可以大大降低常见神经网络模型对本地AWS机器学习推理芯片所需的记忆访问的影响,该推理芯片名为pextentia,可通过Amazon EC2 INF1实例获得。
Deep learning (DL) workloads are moving towards accelerators for faster processing and lower cost. Modern DL accelerators are good at handling the large-scale multiply-accumulate operations that dominate DL workloads; however, it is challenging to make full use of the compute power of an accelerator since the data must be properly staged in a software-managed scratchpad memory. Failing to do so can result in significant performance loss. This paper proposes a systematic approach which leverages the polyhedral model to analyze all operators of a DL model together to minimize the number of memory accesses. Experiments show that our approach can substantially reduce the impact of memory accesses required by common neural-network models on a homegrown AWS machine-learning inference chip named Inferentia, which is available through Amazon EC2 Inf1 instances.