边缘计算中的设计和原型分布式CNN推理加速度

论文标题

边缘计算中的设计和原型分布式CNN推理加速度

Design and Prototyping Distributed CNN Inference Acceleration in Edge Computing

论文作者

Dong, Zhongtian, Li, Nan, Iosifidis, Alexandros, Zhang, Qi

论文摘要

对于使用深度学习的时间关键物联网应用程序，通过分布式计算的推理加速是满足严格截止日期的一种有前途的方法。在本文中，我们使用三个Raspberry Pi 4实现了新的分布式推理加速方法HALP的工作原型。HALP通过在边缘计算中设计边缘设备（EDS）之间的无缝协作来加速推理。我们通过基于基于段的分区优化任务分配比率来最大化协作ED之间的通信和计算之间的并行化。实验结果表明，分布式推理HALP实现了VGG-16的1.7倍推理加速度。然后，我们通过为Mobilenet-V1设置不同的缩小超参数来将分布式推理与常规神经网络模型压缩结合在一起。这样，我们可以进一步加速推断，但要付出推断准确性损失的代价。为了在延迟和准确性之间取得平衡，我们建议动态模型选择选择一个模型，该模型在延迟约束中提供了最高精度。结果表明，与常规独立计算相比，具有分布式推理HALP的模型选择可以显着提高服务可靠性。

For time-critical IoT applications using deep learning, inference acceleration through distributed computing is a promising approach to meet a stringent deadline. In this paper, we implement a working prototype of a new distributed inference acceleration method HALP using three raspberry Pi 4. HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing. We maximize the parallelization between communication and computation among the collaborative EDs by optimizing the task partitioning ratio based on the segment-based partitioning. Experimental results show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16. Then, we combine distributed inference with conventional neural network model compression by setting up different shrinking hyperparameters for MobileNet-V1. In this way, we can further accelerate inference but at the cost of inference accuracy loss. To strike a balance between latency and accuracy, we propose dynamic model selection to select a model which provides the highest accuracy within the latency constraint. It is shown that the model selection with distributed inference HALP can significantly improve service reliability compared to the conventional stand-alone computation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题