论文标题

CNN2GATE:设计一个通用框架,用于实施FPGA上的卷积神经网络

CNN2Gate: Toward Designing a General Framework for Implementation of Convolutional Neural Networks on FPGA

论文作者

Ghaffari, Alireza, Savaria, Yvon

论文摘要

卷积神经网络(CNN)对我们的社会产生了重大影响,因为它们提供了许多服务。另一方面,它们需要相当大的计算能力。为了满足这些要求,可以使用图形处理单元(GPU)。但是,高功耗和有限的外部iOS限制了其在行业和任务至关任务情景中的可用性和适用性。最近,利用FPGA实施CNN的研究数量正在迅速增加。这是由于这些平台提供的较低的功耗和易于重新配置。由于诸如架构,综合和优化等主题的研究工作,因此正在为将这些硬件解决方案集成到高级机器学习软件库中带来了一些新的挑战。本文介绍了一个集成框架(CNN2GATE),该框架支持FPGA目标的CNN模型的汇编。 CNN2GATE利用商业供应商提供的FPGA的OpenCL合成工作流程。 CNN2GATE能够从几个流行的高级机器学习库中解析CNN模型,例如Keras,Pytorch,Caffe2等。CNN2GATE提取了层的计算流量,除了重量和偏见,并应用了“给定”的定量点。此外,它以适当的格式编写OpenCL合成工具的格式,然后将其用于在FPGA上构建和运行该项目。 CNN2GATE使用加固学习代理进行设计空间探索,并在不同的FPGA上自动适合不同的FPGA设计。本文报告了各种英特尔FPGA平台上Alexnet和VGG-16自动合成和设计空间探索的结果。 CNN2Gate在FPGA上的VGG-16达到了205毫秒的延迟,Alexnet的延迟为18 ms。

Convolutional Neural Networks (CNNs) have a major impact on our society because of the numerous services they provide. On the other hand, they require considerable computing power. To satisfy these requirements, it is possible to use graphic processing units (GPUs). However, high power consumption and limited external IOs constrain their usability and suitability in industrial and mission-critical scenarios. Recently, the number of researches that utilize FPGAs to implement CNNs are increasing rapidly. This is due to the lower power consumption and easy reconfigurability offered by these platforms. Because of the research efforts put into topics such as architecture, synthesis and optimization, some new challenges are arising to integrate such hardware solutions to high-level machine learning software libraries. This paper introduces an integrated framework (CNN2Gate) that supports compilation of a CNN model for an FPGA target. CNN2Gate exploits the OpenCL synthesis workflow for FPGAs offered by commercial vendors. CNN2Gate is capable of parsing CNN models from several popular high-level machine learning libraries such as Keras, Pytorch, Caffe2 etc. CNN2Gate extracts computation flow of layers, in addition to weights and biases and applies a "given" fixed-point quantization. Furthermore, it writes this information in the proper format for OpenCL synthesis tools that are then used to build and run the project on FPGA. CNN2Gate performs design-space exploration using a reinforcement learning agent and fits the design on different FPGAs with limited logic resources automatically. This paper reports results of automatic synthesis and design-space exploration of AlexNet and VGG-16 on various Intel FPGA platforms. CNN2Gate achieves a latency of 205 ms for VGG-16 and 18 ms for AlexNet on the FPGA.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源