论文标题

多访问分布式计算

Multi-Access Distributed Computing

论文作者

Brunero, Federico, Elia, Petros

论文摘要

编码的分布式计算(CDC)是一种提出的新技术,目的是减少并行化分布式计算系统所需的强烈数据交换。在著名的MapReduce范式下,已证明这种编码方法可将此通信开销减少到与映射阶段中总体计算负载线性成正比的因素。然而,人们普遍认为,该开销仍然是分布式计算中的主要瓶颈。为了解决这个问题,我们采用了一种新的方法,并探索了一个新的系统模型,该模型对于上述映射阶段的总体计算负载,设法提供了惊人的通信开销降低,并且也许是违反直觉的计算并行化。特别是,我们建议多访问分布式计算(MADC)作为原始CDC模型的新概括,现在映射器和还原器是通过多访问网络拓扑连接的不同计算节点。我们专注于使用组合拓扑的MADC设置,这意味着$λ$映射器和$ k $还原器,因此我们提出了一种新颖的编码方案和一种新颖的信息理论匡威,共同识别出最佳的降低频率通信与$ 1.5 $ 1.5 $ 1.5 $ 1.5 $ 1.5 $ 1.5 $ 1.5 $ 1.5 $ 1.5。此外,修改后的编码方案和Converse确定了所有现有链接的最佳Max-Link通信负载,以在$ 4 $的差距内。此处报告的无与伦比的编码收益不应简单地归功于访问更多映射的数据,而应归功于拓扑在有效地对齐映射输出中的强大作用。这种实现提出了一个开放的问题,即多功能网络拓扑确保了分布式计算中最佳的性能。

Coded distributed computing (CDC) is a new technique proposed with the purpose of decreasing the intense data exchange required for parallelizing distributed computing systems. Under the famous MapReduce paradigm, this coded approach has been shown to decrease this communication overhead by a factor that is linearly proportional to the overall computation load during the mapping phase. Nevertheless, it is widely accepted that this overhead remains a main bottleneck in distributed computing. To address this, we take a new approach and we explore a new system model which, for the same aforementioned overall computation load of the mapping phase, manages to provide astounding reductions of the communication overhead and, perhaps counterintuitively, a substantial increase of the computational parallelization. In particular, we propose multi-access distributed computing (MADC) as a novel generalization of the original CDC model, where now mappers and reducers are distinct computing nodes that are connected through a multi-access network topology. Focusing on the MADC setting with combinatorial topology, which implies $Λ$ mappers and $K$ reducers such that there is a unique reducer connected to any $α$ mappers, we propose a novel coded scheme and a novel information-theoretic converse, which jointly identify the optimal inter-reducer communication load to within a constant gap of $1.5$. Additionally, a modified coded scheme and converse identify the optimal max-link communication load across all existing links to within a gap of $4$. The unparalleled coding gains reported here should not be simply credited to having access to more mapped data, but rather to the powerful role of topology in effectively aligning mapping outputs. This realization raises the open question of which multi-access network topology guarantees the best possible performance in distributed computing.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源