论文标题

联合几何蒙特卡洛聚类以对抗非IID数据集

Federated Geometric Monte Carlo Clustering to Counter Non-IID Datasets

论文作者

Lucchetti, Federico, Decouchant, Jérémie, Fernandes, Maria, Chen, Lydia Y., Völp, Marcus

论文摘要

联合学习使客户可以在在不同位置获取的数据集上进行协作培训模型,并且由于其大小或法规而无法交换。此类收集的数据越来越非独立且非相同分布(非IID),对训练准确性产生负面影响。以前的工作试图减轻非IID数据集对训练准确性的影响,主要集中在非IID标签上,但是实际的数据集通常还包含非IID功能。为了解决非IID标签和功能,我们提出了FedGMCC,这是一个新颖的框架,中央服务器汇总了可以聚集在一起的客户端模型。 FedGMCC聚类依赖于蒙特卡洛程序,该过程采样了客户模型的输出空间,在损耗歧管上渗透了其在重量空间中的位置,并通过仿射曲线参数化计算其几何连接。 FedGMCC聚集了沿其路径连接的连接模型,以产生更丰富的全球模型,并结合了所有连接的客户模型的知识。在EMNIST62和基因组序列分类数据集上,FEDGMCC的表现优于FedAvg和FedProx(最多 +63%)。在高非IID特征空间设置和标签不一致的情况下,FedGMCC在基因组数据集上的精度(+4%)提高了精度(+4%)。

Federated learning allows clients to collaboratively train models on datasets that are acquired in different locations and that cannot be exchanged because of their size or regulations. Such collected data is increasingly non-independent and non-identically distributed (non-IID), negatively affecting training accuracy. Previous works tried to mitigate the effects of non-IID datasets on training accuracy, focusing mainly on non-IID labels, however practical datasets often also contain non-IID features. To address both non-IID labels and features, we propose FedGMCC, a novel framework where a central server aggregates client models that it can cluster together. FedGMCC clustering relies on a Monte Carlo procedure that samples the output space of client models, infers their position in the weight space on a loss manifold and computes their geometric connection via an affine curve parametrization. FedGMCC aggregates connected models along their path connectivity to produce a richer global model, incorporating knowledge of all connected client models. FedGMCC outperforms FedAvg and FedProx in terms of convergence rates on the EMNIST62 and a genomic sequence classification datasets (by up to +63%). FedGMCC yields an improved accuracy (+4%) on the genomic dataset with respect to CFL, in high non-IID feature space settings and label incongruency.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源