论文标题
通过分布式工业物联网网络中的DNN分区的低延节联盟学习
Low-latency Federated Learning with DNN Partition in Distributed Industrial IoT Networks
论文作者
论文摘要
联合学习(FL)赋予工业互联网(IIOT)具有分布式工业自动化智能的能力,这要归功于其分布式机器学习的能力,没有任何原始数据交换。但是,轻巧的IIOT设备在大规模深神经网络(DNNS)上进行计算密集型的本地模型培训非常具有挑战性。在此问题的驱动下,我们为资源有限的IIT网络开发了一个通信计算有效的FL框架,该框架将DNN分区技术集成到标准FL机制中,其中IIT设备在目标DNN的底层上进行了本地模型训练,并将顶层的顶层卸载到边缘门户方面。考虑到不平衡的数据分布,我们得出了特定于设备的参与率,以涉及在更多的通信回合中具有更好数据分布的设备。在得出特定于设备的参与率的情况下,我们建议在设备特定参与率,能耗和记忆使用的限制下最大程度地减少训练延迟。为此,我们制定了设备调度和资源分配的联合优化问题(即DNN分区点,通道分配,传输功率和计算频率),并根据Lyapunov技术求解长期的Min-Max混合整数非线性编程。特别是,提议的动态设备调度和资源分配(DDSRA)算法可以实现权衡取舍,以平衡训练延迟最小化和FL性能。我们还为DDSRA算法提供了具有凸面和非凸面设置的DDSRA算法结合。实验结果表明,从可行性方面,派生的设备特异性参与率,并表明DDSRA算法在测试准确性和收敛时间方面表现优于基准。
Federated Learning (FL) empowers Industrial Internet of Things (IIoT) with distributed intelligence of industrial automation thanks to its capability of distributed machine learning without any raw data exchange. However, it is rather challenging for lightweight IIoT devices to perform computation-intensive local model training over large-scale deep neural networks (DNNs). Driven by this issue, we develop a communication-computation efficient FL framework for resource-limited IIoT networks that integrates DNN partition technique into the standard FL mechanism, wherein IIoT devices perform local model training over the bottom layers of the objective DNN, and offload the top layers to the edge gateway side. Considering imbalanced data distribution, we derive the device-specific participation rate to involve the devices with better data distribution in more communication rounds. Upon deriving the device-specific participation rate, we propose to minimize the training delay under the constraints of device-specific participation rate, energy consumption and memory usage. To this end, we formulate a joint optimization problem of device scheduling and resource allocation (i.e. DNN partition point, channel assignment, transmit power, and computation frequency), and solve the long-term min-max mixed integer non-linear programming based on the Lyapunov technique. In particular, the proposed dynamic device scheduling and resource allocation (DDSRA) algorithm can achieve a trade-off to balance the training delay minimization and FL performance. We also provide the FL convergence bound for the DDSRA algorithm with both convex and non-convex settings. Experimental results demonstrate the derived device-specific participation rate in terms of feasibility, and show that the DDSRA algorithm outperforms baselines in terms of test accuracy and convergence time.