优化深度学习加速器的内存访问模式

论文标题

优化深度学习加速器的内存访问模式

Optimizing Memory-Access Patterns for Deep Learning Accelerators

论文作者

Zheng, Hongbin, Oh, Sejong, Wang, Huiqing, Briggs, Preston, Gai, Jiading, Jain, Animesh, Liu, Yizhi, Heaton, Rich, Huang, Randy, Wang, Yida

论文摘要

深度学习（DL）的工作负载正在朝着加速器迈进，以进行更快的处理和更低的成本。现代DL加速器擅长处理主导DL工作负载的大规模多功能操作。但是，充分利用加速器的计算功率是一项挑战，因为数据必须在软件管理的ScratchPad内存中正确上演。如果不这样做会导致巨大的绩效损失。本文提出了一种系统的方法，该方法利用多面体模型一起分析DL模型的所有操作员，以最大程度地减少内存访问的数量。实验表明，我们的方法可以大大降低常见神经网络模型对本地AWS机器学习推理芯片所需的记忆访问的影响，该推理芯片名为pextentia，可通过Amazon EC2 INF1实例获得。

Deep learning (DL) workloads are moving towards accelerators for faster processing and lower cost. Modern DL accelerators are good at handling the large-scale multiply-accumulate operations that dominate DL workloads; however, it is challenging to make full use of the compute power of an accelerator since the data must be properly staged in a software-managed scratchpad memory. Failing to do so can result in significant performance loss. This paper proposes a systematic approach which leverages the polyhedral model to analyze all operators of a DL model together to minimize the number of memory accesses. Experiments show that our approach can substantially reduce the impact of memory accesses required by common neural-network models on a homegrown AWS machine-learning inference chip named Inferentia, which is available through Amazon EC2 Inf1 instances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题