论文标题
PICO:多种移动设备上多功能CNN的管道推理框架
PICO: Pipeline Inference Framework for Versatile CNNs on Diverse Mobile Devices
论文作者
论文摘要
近年来已经研究了将卷积神经网络(CNN)推断为多个移动设备的推断,以实现实时推断而不会失去准确性。但是,如何将CNN映射到设备仍然是一个挑战。一方面,安排具有多个设备的最先进CNN的工作量是NP-HARD,因为CNN的结构是定向的无环图(DAG),而不是简单的链。另一方面,由于无线环境和异质设备,分配推理工作量遭受了昂贵的通信和不平衡的计算。本文介绍了PICO,这是一个管道合作框架,以加速各种移动设备上多功能CNN的推断。 PICO的核心特征是:(1)一种通用的图形分区算法,该算法考虑了任何给定的CNN的特征,并将其策划为具有合适粒度的模型列表,以及(2)多对多的映射算法,可为异性设备生成最佳的管道配置。在我们对2〜8 Raspberry-Pi设备的实验中,在不同的CPU频率下,吞吐量可以提高1.8〜6.8倍。
Distributing the inference of convolutional neural network (CNN) to multiple mobile devices has been studied in recent years to achieve real-time inference without losing accuracy. However, how to map CNN to devices remains a challenge. On the one hand, scheduling the workload of state-of-the-art CNNs with multiple devices is NP-Hard because the structures of CNNs are directed acyclic graphs (DAG) rather than simple chains. On the other hand, distributing the inference workload suffers from expensive communication and unbalanced computation due to the wireless environment and heterogeneous devices. This paper presents PICO, a pipeline cooperation framework to accelerate the inference of versatile CNNs on diverse mobile devices. At its core, PICO features: (1) a generic graph partition algorithm that considers the characteristics of any given CNN and orchestrates it into a list of model pieces with suitable granularity, and (2) a many-to-many mapping algorithm that produces the best pipeline configuration for heterogeneous devices. In our experiment with 2 ~ 8 Raspberry-Pi devices, the throughput can be improved by 1.8 ~ 6.8x under different CPU frequencies.