氏族：在商品边缘设备上使用异步神经进化的连续学习

论文标题

氏族：在商品边缘设备上使用异步神经进化的连续学习

CLAN: Continuous Learning using Asynchronous Neuroevolution on Commodity Edge Devices

论文作者

Mannan, Parth, Samajdar, Ananda, Krishna, Tushar

论文摘要

机器学习算法的最新进展，尤其是深度神经网络（DNN）的发展已改变了人工智能的景观（AI）。每天都有一整天，采用基于深度学习的方法来解决新的问题，并以非凡的结果解决。现实世界的门户是边缘。只有当我们可以让AI代理与现实世界不断互动并解决日常问题时，AI的真正影响才能完全实现。不幸的是，DNN的高计算和记忆要求对这一愿景构成了巨大的障碍。今天，我们通过从云中采购训练有素的模型时将特殊用途推理硬件部署在边缘上来避免此问题。但是，这种方法依赖于与云的不断互动来传输所有数据，对大量GPU群集进行培训以及下载更新的模型。这对于自主代理可能表现出的带宽，隐私和持续的连通性问题是具有挑战性的。在本文中，我们评估了与任何高端云/服务器零交互的边缘设备上的自适应智能的技术。我们通过运行神经进化（NE）学习和推理的WiFi通信的Raspberry PI的原型分布式系统。我们评估了这种协作系统的性能，并详细介绍了权衡并行性与通信的系统不同安排的计算/通信特征。使用分析中的见解，我们还提出了算法修改，以在学习阶段将通信降低高达3.6倍，以进一步提高可扩展性，并在大规模上匹配高端计算设备的性能。我们认为，这些见解将使算法 - 硬件的共同设计努力使能够在边缘进行持续学习。

Recent advancements in machine learning algorithms, especially the development of Deep Neural Networks (DNNs) have transformed the landscape of Artificial Intelligence (AI). With every passing day, deep learning based methods are applied to solve new problems with exceptional results. The portal to the real world is the edge. The true impact of AI can only be fully realized if we can have AI agents continuously interacting with the real world and solving everyday problems. Unfortunately, high compute and memory requirements of DNNs acts a huge barrier towards this vision. Today we circumvent this problem by deploying special purpose inference hardware on the edge while procuring trained models from the cloud. This approach, however, relies on constant interaction with the cloud for transmitting all the data, training on massive GPU clusters, and downloading updated models. This is challenging for bandwidth, privacy, and constant connectivity concerns that autonomous agents may exhibit. In this paper we evaluate techniques for enabling adaptive intelligence on edge devices with zero interaction with any high-end cloud/server. We build a prototype distributed system of Raspberry Pis communicating via WiFi running NeuroEvolutionary (NE) learning and inference. We evaluate the performance of such a collaborative system and detail the compute/communication characteristics of different arrangements of the system that trade-off parallelism versus communication. Using insights from our analysis, we also propose algorithmic modifications to reduce communication by up to 3.6x during the learning phase to enhance scalability even further and match performance of higher end computing devices at scale. We believe that these insights will enable algorithm-hardware co-design efforts for enabling continuous learning on the edge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题