论文标题
改善基于FPGA的卷积神经网络的速度估算
Improving Performance Estimation for FPGA-based Accelerators for Convolutional Neural Networks
论文作者
论文摘要
基于现场编程的门阵列(FPGA)基于卷积神经网络(CNN)的加速器广泛用于加速,因为它们在提高了特定应用程序实例的性能和可重新配置方面的潜力。为了确定基于FPGA的加速器的最佳配置,有必要探索设计空间和准确的性能预测在探索过程中起重要作用。这项工作介绍了一种新的方法,用于基于通过分析近似参数的高斯过程快速准确估计延迟,并与运行时数据结合。在Intel Arria 10 GX 1150上基于FPGA的加速器上进行的三个不同CNN进行的实验表明,与保留的交叉验证中的标准分析方法相比,相对于平均绝对误差,准确性提高了30.7%。
Field-programmable gate array (FPGA) based accelerators are being widely used for acceleration of convolutional neural networks (CNNs) due to their potential in improving the performance and reconfigurability for specific application instances. To determine the optimal configuration of an FPGA-based accelerator, it is necessary to explore the design space and an accurate performance prediction plays an important role during the exploration. This work introduces a novel method for fast and accurate estimation of latency based on a Gaussian process parametrised by an analytic approximation and coupled with runtime data. The experiments conducted on three different CNNs on an FPGA-based accelerator on Intel Arria 10 GX 1150 demonstrated a 30.7% improvement in accuracy with respect to the mean absolute error in comparison to a standard analytic method in leave-one-out cross-validation.