论文标题
基于计算记忆的混合精液深度学习
Mixed-precision deep learning based on computational memory
论文作者
论文摘要
深度神经网络(DNN)彻底改变了人工智能领域,并在诸如图像和语音识别等认知任务中取得了前所未有的成功。但是,大型DNN的培训在计算密集型上是促进了针对该应用程序的新型计算体系结构的促进。具有横杆阵列组织的纳米级电阻内存设备的计算记忆单元可以将突触权重存储在其电导状态下,并以非冯·诺伊曼(Neumann)的方式执行昂贵的加权总和。但是,在重量更新过程中以可靠的方式更新电导状态是一个基本挑战,它限制了这种实施的训练准确性。在这里,我们提出了一种混合精确体系结构,该体系结构结合了一个计算内存单元,该计算存储器单元通过数字处理单元进行加权求和和不精确的电导更新,该数字处理单元以高精度积累了权重更新。基于提议的体系结构(PCM)阵列(基于MNIST数据集对软件基线的0.6%之内)的任务,使用相变内存(PCM)阵列基于提议的架构(PCM)阵列的组合硬件/软件培训实验,实现了97.73%的测试精度。使用PCM的准确行为模型在广泛的网络,即卷积神经网络,长期记忆网络和生成对流网络上进一步评估架构。与浮点实现相当的精度是可以实现的,而不会受到与PCM设备相关的非理想性的约束。与专用的全数字32位实施相比,一项系统级研究表明,用于培训多层感知器时,体系结构的能源效率提高了173倍。
Deep neural networks (DNNs) have revolutionized the field of artificial intelligence and have achieved unprecedented success in cognitive tasks such as image and speech recognition. Training of large DNNs, however, is computationally intensive and this has motivated the search for novel computing architectures targeting this application. A computational memory unit with nanoscale resistive memory devices organized in crossbar arrays could store the synaptic weights in their conductance states and perform the expensive weighted summations in place in a non-von Neumann manner. However, updating the conductance states in a reliable manner during the weight update process is a fundamental challenge that limits the training accuracy of such an implementation. Here, we propose a mixed-precision architecture that combines a computational memory unit performing the weighted summations and imprecise conductance updates with a digital processing unit that accumulates the weight updates in high precision. A combined hardware/software training experiment of a multilayer perceptron based on the proposed architecture using a phase-change memory (PCM) array achieves 97.73% test accuracy on the task of classifying handwritten digits (based on the MNIST dataset), within 0.6% of the software baseline. The architecture is further evaluated using accurate behavioral models of PCM on a wide class of networks, namely convolutional neural networks, long-short-term-memory networks, and generative-adversarial networks. Accuracies comparable to those of floating-point implementations are achieved without being constrained by the non-idealities associated with the PCM devices. A system-level study demonstrates 173x improvement in energy efficiency of the architecture when used for training a multilayer perceptron compared with a dedicated fully digital 32-bit implementation.