论文标题

在分布式和联合学习中进行沟通压缩的不确定性原理以及寻找最佳压缩机

Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor

论文作者

Safaryan, Mher, Shulgin, Egor, Richtárik, Peter

论文摘要

为了减轻分布式和联合学习的高度沟通成本,各种向量压缩方案(例如量化,稀疏和抖动)变得非常流行。在设计一种压缩方法时,人们的目标是尽可能少少量沟通,从而最大程度地减少通信的成本,同时试图将尽可能少的失真(差异)传递给通信消息,从而最大程度地减少了压缩对沟通数量的不利影响。但是,直觉上,这两个目标从根本上讲是冲突:我们允许的压缩越多,信息就越扭曲。我们对这种直觉进行了形式化,并证明了随机压缩操作员的{\ em不确定性原理},从而从数学上量化了这一限制,并且有效地提供了渐近地下的下界,以实现交流压缩可能实现的目标}。在这些事态发展的激励下,我们呼吁寻找最佳的压缩操作员。为了朝这个方向迈出第一步,我们考虑了一种受kashin表示启发的无偏压缩方法,我们称之为{\ em kashin compression(kc)}。与所有先前提出的压缩机制相反,KC享受一个{\ em维独立}方差,即使在每个矢量输入中只需要传达几个位时,我们即使在该制度中都会得出明确的公式。

In order to mitigate the high communication cost in distributed and federated learning, various vector compression schemes, such as quantization, sparsification and dithering, have become very popular. In designing a compression method, one aims to communicate as few bits as possible, which minimizes the cost per communication round, while at the same time attempting to impart as little distortion (variance) to the communicated messages as possible, which minimizes the adverse effect of the compression on the overall number of communication rounds. However, intuitively, these two goals are fundamentally in conflict: the more compression we allow, the more distorted the messages become. We formalize this intuition and prove an {\em uncertainty principle} for randomized compression operators, thus quantifying this limitation mathematically, and {\em effectively providing asymptotically tight lower bounds on what might be achievable with communication compression}. Motivated by these developments, we call for the search for the optimal compression operator. In an attempt to take a first step in this direction, we consider an unbiased compression method inspired by the Kashin representation of vectors, which we call {\em Kashin compression (KC)}. In contrast to all previously proposed compression mechanisms, KC enjoys a {\em dimension independent} variance bound for which we derive an explicit formula even in the regime when only a few bits need to be communicate per each vector entry.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源