稀疏的周期性收缩数据流，用于降低卷积神经网络加速器的延迟和功率耗散

论文标题

稀疏的周期性收缩数据流，用于降低卷积神经网络加速器的延迟和功率耗散

Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators

论文作者

Heo, Jung Hwan, Fayyazi, Arash, Esmaili, Amirhossein, Pedram, Massoud

论文摘要

本文介绍了稀疏的周期性收缩期（SPS）数据流，该数据流程推进了最先进的硬件加速器，用于支持轻型神经网络。具体而言，SPS DataFlow启用了一种新型的硬件设计方法，该方法通过紧急修剪方案（定期基于模式的稀疏性（PPS））解锁。通过利用PPS的规律性，我们的稀疏感知编译器可以最佳地重新定位权重，并在硬件中使用一个简单的索引单元来在权重和激活之间创建匹配。通过编译器硬件编码，SPS DataFlow具有更高的并行度，同时没有高索引开销，并且没有模型的准确性损失。在诸如VGG和Resnet之类的流行基准测试中，SPS数据流以及随附的神经网络编译器编译器胜于卷积神经网络（CNN）加速器设计针对FPGA设备的设计。针对其他支撑重量存储格式，SPS导致4.49倍的能源效率提高，同时将存储需求降低3.67倍，用于总重量存储（非固定重量加索引），而22,044 x用于索引存储器。

This paper introduces the sparse periodic systolic (SPS) dataflow, which advances the state-of-the-art hardware accelerator for supporting lightweight neural networks. Specifically, the SPS dataflow enables a novel hardware design approach unlocked by an emergent pruning scheme, periodic pattern-based sparsity (PPS). By exploiting the regularity of PPS, our sparsity-aware compiler optimally reorders the weights and uses a simple indexing unit in hardware to create matches between the weights and activations. Through the compiler-hardware codesign, SPS dataflow enjoys higher degrees of parallelism while being free of the high indexing overhead and without model accuracy loss. Evaluated on popular benchmarks such as VGG and ResNet, the SPS dataflow and accompanying neural network compiler outperform prior work in convolutional neural network (CNN) accelerator designs targeting FPGA devices. Against other sparsity-supporting weight storage formats, SPS results in 4.49x energy efficiency gain while lowering storage requirements by 3.67x for total weight storage (non-pruned weights plus indexing) and 22,044x for indexing memory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题