通过自适应模型分区的深度神经网络的校准辅助边缘推理卸载

论文标题

通过自适应模型分区的深度神经网络的校准辅助边缘推理卸载

Calibration-Aided Edge Inference Offloading via Adaptive Model Partitioning of Deep Neural Networks

论文作者

Pacheco, Roberto G., Couto, Rodrigo S., Simeone, Osvaldo

论文摘要

移动设备可以卸载基于深层神经网络（DNN）对云的推理，从而克服本地硬件和能量限制。但是，卸载增加了通信延迟，从而增加了整体推理时间，因此仅在需要时才能使用它。一种解决此问题的方法包括使用基于早期外观DNN的自适应模型分区。因此，推理从移动设备开始，中间层估计准确性：如果估计的精度足够，则该设备采取推理决策；否则，DNN的其余层在云上运行。因此，仅当设备无法以高度置信度对样本分类时，该设备才能将推理卸载到云中。此卸载需要在设备上正确的准确性预测。然而，DNN通常是错误校准的，提供了过度自信的决定。这项工作表明，使用模型分配的错误校准的早期外观DNN可以大大降低推理准确性。相反，我们认为在部署前实施校准算法可以解决此问题，从而可以更可靠地卸载决策。

Mobile devices can offload deep neural network (DNN)-based inference to the cloud, overcoming local hardware and energy limitations. However, offloading adds communication delay, thus increasing the overall inference time, and hence it should be used only when needed. An approach to address this problem consists of the use of adaptive model partitioning based on early-exit DNNs. Accordingly, the inference starts at the mobile device, and an intermediate layer estimates the accuracy: If the estimated accuracy is sufficient, the device takes the inference decision; Otherwise, the remaining layers of the DNN run at the cloud. Thus, the device offloads the inference to the cloud only if it cannot classify a sample with high confidence. This offloading requires a correct accuracy prediction at the device. Nevertheless, DNNs are typically miscalibrated, providing overconfident decisions. This work shows that the employment of a miscalibrated early-exit DNN for offloading via model partitioning can significantly decrease inference accuracy. In contrast, we argue that implementing a calibration algorithm prior to deployment can solve this problem, allowing for more reliable offloading decisions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题