论文标题
NITI:使用仅整数算术的培训整数神经网络
NITI: Training Integer Neural Networks Using Integer-only Arithmetic
论文作者
论文摘要
尽管整数算术已被广泛采用,以提高深度量化神经网络推断的性能,但培训仍然是一项主要使用浮点算术执行的任务。这是因为高动态范围和数值准确性对于大多数现代训练算法的成功至关重要。但是,由于其在硬件加速器中具有计算,存储和能量优势的潜力,可以以低精度的仅精度整数算术来实施的神经网络训练方法仍然是一项积极的研究挑战。在本文中,我们提出了NITI,这是一个有效的深度神经网络训练框架,该框架将所有参数和中间值作为整数存储,并专门用整数算术计算。提出了一种伪随机圆形方案,该方案提出了消除对外部随机数生成需求的需求,以促进从更广泛的中间结果转换为低精度存储。此外,提出了使用仅整数算术计算的交叉渗透反向传播方案。介绍了概念验证的开源软件实施NITI,它利用现代GPU中的本机8位整数操作来实现端到端培训。与通过浮点存储和算术实现的等效训练设置相比,NITI使用8位整数存储和计算在MNIST和CIFAR10数据集上实现了可忽略的精度降解。在ImageNet上,使用8位数据的重量积累需要16位整数。这取得了与所有浮动点实现相当的培训结果。
While integer arithmetic has been widely adopted for improved performance in deep quantized neural network inference, training remains a task primarily executed using floating point arithmetic. This is because both high dynamic range and numerical accuracy are central to the success of most modern training algorithms. However, due to its potential for computational, storage and energy advantages in hardware accelerators, neural network training methods that can be implemented with low precision integer-only arithmetic remains an active research challenge. In this paper, we present NITI, an efficient deep neural network training framework that stores all parameters and intermediate values as integers, and computes exclusively with integer arithmetic. A pseudo stochastic rounding scheme that eliminates the need for external random number generation is proposed to facilitate conversion from wider intermediate results to low precision storage. Furthermore, a cross-entropy loss backpropagation scheme computed with integer-only arithmetic is proposed. A proof-of-concept open-source software implementation of NITI that utilizes native 8-bit integer operations in modern GPUs to achieve end-to-end training is presented. When compared with an equivalent training setup implemented with floating point storage and arithmetic, NITI achieves negligible accuracy degradation on the MNIST and CIFAR10 datasets using 8-bit integer storage and computation. On ImageNet, 16-bit integers are needed for weight accumulation with an 8-bit datapath. This achieves training results comparable to all-floating-point implementations.