启用有关移动设备的深入学习：方法，系统和应用程序

论文标题

启用有关移动设备的深入学习：方法，系统和应用程序

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

论文作者

Cai, Han, Lin, Ji, Lin, Yujun, Liu, Zhijian, Tang, Haotian, Wang, Hanrui, Zhu, Ligeng, Han, Song

论文摘要

深度神经网络（DNN）在人工智能（AI）领域取得了前所未有的成功，包括计算机视觉，自然语言处理和语音识别。但是，它们的出色性能以相当大的计算复杂性为代价，这极大地阻碍了它们在许多资源受限设备（例如手机和物联网（IoT）设备）中的应用。因此，能够提高效率瓶颈的方法和技术在保持DNN的高度准确性的同时非常有需求，以实现众多的边缘AI应用。本文概述了有效的深度学习方法，系统和应用程序。我们从引入流行的模型压缩方法开始，包括修剪，分解，量化以及紧凑的模型设计。为了降低这些手动解决方案的大型设计成本，我们讨论了每个手动解决方案的框架，例如神经体系结构搜索（NAS）和自动修剪和量化。然后，我们介绍有效的机上培训，以根据移动设备上的本地数据启用用户自定义。除了一般的加速技术外，我们还通过利用其空间稀疏性和时间/令牌冗余来展示点云，视频和自然语言处理的几个特定任务加速度。最后，为了支持所有这些算法进步，我们从软件和硬件角度介绍了有效的深度学习系统设计。

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand in order to enable numerous edge AI applications. This paper provides an overview of efficient deep learning methods, systems and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization as well as compact model design. To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. We then cover efficient on-device training to enable user customization based on the local data on mobile devices. Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题