论文标题
局部:随着时间的推移,可扩展和并行处理与深度神经网络
PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks
论文作者
论文摘要
在本文中,我们介绍了Partime,这是一个用Python编写的软件库,并基于Pytorch,专门设计用于随着时间的流逝,用于学习和推理时,旨在加快神经网络。 现有库旨在利用数据级并行性,假设样品是批处理的,这种条件在基于流数据的应用程序中自然满足。不同的是,在流从流中获得的时间时,部分时间开始处理每个数据样本。 Partime包装了实现馈电多层网络的代码,并在多个设备(例如图形处理单元(GPU))之间分配层的处理。由于其基于管道的计算方案,Partime允许设备并行执行计算。在推理时,这会导致缩放功能在理论上是相对于设备数量线性的。在学习阶段,Partime可以利用非i.i.d。流媒体数据的性质,这些样品随着时间的流逝而顺利发展,以进行有效的梯度计算。进行实验是为了将局部与在线学习中经典的非平行神经计算进行经验比较,在多达8个NVIDIA GPU上分发操作,显示出在设备数量中几乎是线性的显着加速,从而减轻了数据传输的影响。
In this paper, we present PARTIME, a software library written in Python and based on PyTorch, designed specifically to speed up neural networks whenever data is continuously streamed over time, for both learning and inference. Existing libraries are designed to exploit data-level parallelism, assuming that samples are batched, a condition that is not naturally met in applications that are based on streamed data. Differently, PARTIME starts processing each data sample at the time in which it becomes available from the stream. PARTIME wraps the code that implements a feed-forward multi-layer network and it distributes the layer-wise processing among multiple devices, such as Graphics Processing Units (GPUs). Thanks to its pipeline-based computational scheme, PARTIME allows the devices to perform computations in parallel. At inference time this results in scaling capabilities that are theoretically linear with respect to the number of devices. During the learning stage, PARTIME can leverage the non-i.i.d. nature of the streamed data with samples that are smoothly evolving over time for efficient gradient computations. Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning, distributing operations on up to 8 NVIDIA GPUs, showing significant speedups that are almost linear in the number of devices, mitigating the impact of the data transfer overhead.