论文标题
SEMIPFL:边缘智能的个性化半监督联合学习框架
SemiPFL: Personalized Semi-Supervised Federated Learning Framework for Edge Intelligence
论文作者
论文摘要
可穿戴设备和物联网(IoT)的最新进展导致了边缘设备中产生的传感器数据的巨大增长。事实证明,将这种大量数据标记为分类任务是具有挑战性的。此外,由不同用户生成的数据具有各种个人属性和边缘异质性,使得开发适合所有用户的全局模型是不切实际的。对数据隐私和通信成本的担忧也禁止集中数据的积累和培训。我们提出的SEMIPFL支持没有标签或有限的标签数据集的边缘用户以及不足以训练良好模型的大量未标记数据。在这项工作中,Edge用户合作培训服务器中的超网络,为每个用户生成个性化的自动编码器。从Edge用户接收更新后,服务器将为每个用户生成一组基本模型,用户使用自己的标记数据集将其本地汇总。我们全面评估了从可穿戴健康到IoT的各种应用程序方案的各种公共数据集上提出的框架,并证明SEMIPFL在与用户性能,网络足迹和计算消耗的相同假设下,SEMIPFL优于最先进的联邦学习框架。我们还表明,该解决方案对没有标签或标记数据集有限的用户表现良好,并且可以提高标记数据和用户数量的性能,这表示SEMIPFL在处理数据异质性和有限注释方面的有效性。我们还展示了SEMIPFL在三个实时场景中处理用户硬件资源异质性的稳定性。
Recent advances in wearable devices and Internet-of-Things (IoT) have led to massive growth in sensor data generated in edge devices. Labeling such massive data for classification tasks has proven to be challenging. In addition, data generated by different users bear various personal attributes and edge heterogeneity, rendering it impractical to develop a global model that adapts well to all users. Concerns over data privacy and communication costs also prohibit centralized data accumulation and training. We propose SemiPFL that supports edge users having no label or limited labeled datasets and a sizable amount of unlabeled data that is insufficient to train a well-performing model. In this work, edge users collaborate to train a Hyper-network in the server, generating personalized autoencoders for each user. After receiving updates from edge users, the server produces a set of base models for each user, which the users locally aggregate them using their own labeled dataset. We comprehensively evaluate our proposed framework on various public datasets from a wide range of application scenarios, from wearable health to IoT, and demonstrate that SemiPFL outperforms state-of-art federated learning frameworks under the same assumptions regarding user performance, network footprint, and computational consumption. We also show that the solution performs well for users without label or having limited labeled datasets and increasing performance for increased labeled data and number of users, signifying the effectiveness of SemiPFL for handling data heterogeneity and limited annotation. We also demonstrate the stability of SemiPFL for handling user hardware resource heterogeneity in three real-time scenarios.