论文标题
拜占庭式差异方差降低了分布式非i.i.d的联合学习。数据
Byzantine-Robust Variance-Reduced Federated Learning over Distributed Non-i.i.d. Data
论文作者
论文摘要
我们考虑了联邦学习问题,其中工人的数据并非独立且分布相同(I.I.D。)。在学习过程中,数量未知的拜占庭工人可能会向中央节点发送恶意消息,从而导致出色的学习错误。大多数拜占庭式抢劫方法通过使用强大的聚合规则来汇总收到的消息,但要依靠所有常规工人都有I.I.D.的假设来解决此问题。数据,在许多联合学习应用中并非如此。鉴于减少随机梯度噪声来减轻拜占庭攻击的影响的重要性,我们使用重新采样策略来减少两种内部变化的影响(描述了每个常规工人对样本异质性)和外部变化(描述普通工人的样本异质性)以及内部平均层次的变化逐渐消除了渐变的阶梯阶段,以逐步逐渐消失。然后,降低差异消息的消息将使用强大的几何中值运算符汇总。我们证明所提出的方法以线性收敛速率到达最佳解决方案的邻域,并且学习误差由拜占庭工人的数量确定。数值实验证实了理论结果,并表明所提出的方法的表现优于非i.i.d中的最新方法。环境。
We consider the federated learning problem where data on workers are not independent and identically distributed (i.i.d.). During the learning process, an unknown number of Byzantine workers may send malicious messages to the central node, leading to remarkable learning error. Most of the Byzantine-robust methods address this issue by using robust aggregation rules to aggregate the received messages, but rely on the assumption that all the regular workers have i.i.d. data, which is not the case in many federated learning applications. In light of the significance of reducing stochastic gradient noise for mitigating the effect of Byzantine attacks, we use a resampling strategy to reduce the impact of both inner variation (that describes the sample heterogeneity on every regular worker) and outer variation (that describes the sample heterogeneity among the regular workers), along with a stochastic average gradient algorithm to gradually eliminate the inner variation. The variance-reduced messages are then aggregated with a robust geometric median operator. We prove that the proposed method reaches a neighborhood of the optimal solution at a linear convergence rate and the learning error is determined by the number of Byzantine workers. Numerical experiments corroborate the theoretical results and show that the proposed method outperforms the state-of-the-arts in the non-i.i.d. setting.