论文标题
FastWave:在FPGA上加速自回归卷积神经网络
FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA
论文作者
论文摘要
自回归卷积神经网络(CNN)已被广泛利用用于序列生成任务,例如音频综合,语言建模和神经机器翻译。 WaveNet是一种深度自回旋的CNN,由用于序列生成的几个堆叠式卷积层组成。尽管Wavenet产生了最先进的音频产生结果,但幼稚的推断实施却很慢。在高端GPU上仅产生音频的一秒钟,需要几分钟。在这项工作中,我们开发了第一个用于自动卷积神经网络的加速器平台〜\ textit {fastwave},并应对相关的设计挑战。我们在Vivado HLS中设计了快速波动推理模型,并执行广泛的优化,包括定点实现,阵列分区和管道。我们的模型对快速矩阵向量乘法使用完全参数化的并行体系结构,该乘法可实现每层自定义延迟微调以进行进一步的吞吐量改进。我们的实验相对评估了各种优化的吞吐量和资源利用之间的权衡。我们在Xilinx XCVU13P FPGA上的最佳Wavenet设计仅使用芯片内存,与CPU实现相比,与GPU实现相比,与CPU实现相比,生成速度更快66。
Autoregressive convolutional neural networks (CNNs) have been widely exploited for sequence generation tasks such as audio synthesis, language modeling and neural machine translation. WaveNet is a deep autoregressive CNN composed of several stacked layers of dilated convolution that is used for sequence generation. While WaveNet produces state-of-the art audio generation results, the naive inference implementation is quite slow; it takes a few minutes to generate just one second of audio on a high-end GPU. In this work, we develop the first accelerator platform~\textit{FastWave} for autoregressive convolutional neural networks, and address the associated design challenges. We design the Fast-Wavenet inference model in Vivado HLS and perform a wide range of optimizations including fixed-point implementation, array partitioning and pipelining. Our model uses a fully parameterized parallel architecture for fast matrix-vector multiplication that enables per-layer customized latency fine-tuning for further throughput improvement. Our experiments comparatively assess the trade-off between throughput and resource utilization for various optimizations. Our best WaveNet design on the Xilinx XCVU13P FPGA that uses only on-chip memory, achieves 66 faster generation speed compared to CPU implementation and 11 faster generation speed than GPU implementation.