ML模型的稀疏和不规则张量计算的硬件加速度：调查和见解

论文标题

ML模型的稀疏和不规则张量计算的硬件加速度：调查和见解

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

论文作者

Dave, Shail, Baghdadi, Riyadh, Nowatzki, Tony, Avancha, Sasikanth, Shrivastava, Aviral, Li, Baoxin

论文摘要

机器学习（ML）模型被广泛用于许多重要领域。为了有效地处理这些计算和内存密集型应用程序，通过利用稀疏性，尺寸降低和张量的量化来压缩这些过度参数化模型的张量。非结构化的稀疏性和具有不同维度的张量会产生不规则的计算，通信和内存访问模式；以常规方式将它们处理在硬件加速器上并不固有地利用加速机会。本文对硬件加速器上ML模型的稀疏和不规则张量计算有效地进行了全面调查。特别是，它讨论了体系结构设计和软件支持中的增强模块。对不同的硬件设计和加速技术进行分类，并根据硬件和执行成本对其进行分析；分析最近DNN的可实现加速度；从硬件/软件/模型共同设计优化（Inter/Intra Intra Indoule）方面突出了进一步的机会。本文的收获包括：了解加速稀疏，不规则形状和量化张量的关键挑战；了解加速器系统的增强功能，以支持其有效的计算；分析选择特定设计选择的权衡，用于编码，存储，提取，交流，计算和负载平衡非均方体；了解结构化的稀疏度如何提高存储效率并平衡计算；了解如何在加速器上使用稀疏张量来编译和映射模型；了解有效加速和进一步机会的最新设计趋势。

Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these over-parameterized models are compressed by leveraging sparsity, size reduction, and quantization of tensors. Unstructured sparsity and tensors with varying dimensions yield irregular computation, communication, and memory access patterns; processing them on hardware accelerators in a conventional manner does not inherently leverage acceleration opportunities. This paper provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of ML models on hardware accelerators. In particular, it discusses enhancement modules in the architecture design and the software support; categorizes different hardware designs and acceleration techniques and analyzes them in terms of hardware and execution costs; analyzes achievable accelerations for recent DNNs; highlights further opportunities in terms of hardware/software/model co-design optimizations (inter/intra-module). The takeaways from this paper include: understanding the key challenges in accelerating sparse, irregular-shaped, and quantized tensors; understanding enhancements in accelerator systems for supporting their efficient computations; analyzing trade-offs in opting for a specific design choice for encoding, storing, extracting, communicating, computing, and load-balancing the non-zeros; understanding how structured sparsity can improve storage efficiency and balance computations; understanding how to compile and map models with sparse tensors on the accelerators; understanding recent design trends for efficient accelerations and further opportunities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题