HLS4ML的自动驾驶汽车FPGA的实时语义细分

论文标题

HLS4ML的自动驾驶汽车FPGA的实时语义细分

Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

论文作者

Ghielmetti, Nicolò, Loncar, Vladimir, Pierini, Maurizio, Roed, Marcel, Summers, Sioni, Aarrestad, Thea, Petersson, Christoffer, Linander, Hampus, Ngadiuba, Jennifer, Lin, Kelvin, Harris, Philip

论文摘要

在本文中，我们调查了现场可编程的门阵列如何用作与自动驾驶相关的实时语义分割任务的硬件加速器。考虑到ENET卷积神经网络体系结构的压缩版本，我们使用Xilinx ZCU102评估委员会上的少于30％的可用资源展示了全芯片部署，延迟为4.9 ms。当将批处理大小增加到十个时，延迟减少到每个图像3 ms，对应于自动驾驶汽车同时接收多个相机输入的用例。通过积极的过滤器减少和异质量化感知训练以及对卷积层的优化实施，我们可以显着降低电力消耗和资源利用，同时保持CityScapes数据集的准确性。

In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image, using less than 30% of the available resources on a Xilinx ZCU102 evaluation board. The latency is reduced to 3 ms per image when increasing the batch size to ten, corresponding to the use case where the autonomous vehicle receives inputs from multiple cameras simultaneously. We show, through aggressive filter reduction and heterogeneous quantization-aware training, and an optimized implementation of convolutional layers, that the power consumption and resource utilization can be significantly reduced while maintaining accuracy on the Cityscapes dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题