破坏障碍：最大化阵列利用用于计算内存织物

论文标题

破坏障碍：最大化阵列利用用于计算内存织物

Breaking Barriers: Maximizing Array Utilization for Compute In-Memory Fabrics

论文作者

Crafton, Brian, Spetalnick, Samuel, Murali, Gauthaman, Krishna, Tushar, Lim, Sung-Kyu, Raychowdhury, Arijit

论文摘要

计算内存中（CIM）是一种有前途的技术，可将数据传输，大多数数据密集型应用的主要性能瓶颈和能源成本最小化。这已经发现在加速神经网络以用于机器学习应用程序中广泛采用。利用具有新兴非易失性记忆（ENKM）的横梁体系结构，例如密集的电阻随机访问记忆（RRAM）或相变的随机访问记忆（PCRAM），可以实现各种形式的神经网络，以极大地降低功率并增加芯片内存能力。但是，计算内存在电路和设备级别上都面临着自己的局限性。尽管使用横梁架构计算中内存可以大大减少数据传输，但这些较大的固定重量矩阵的刚性性质不足以使传统的CMO和基于SRAM的设计的灵活性。在这项工作中，我们探讨了CIM约束发生的不同同步障碍。此外，我们根据输入数据分布提出了一种新的分配算法和数据流，以最大程度地利用和基于计算内存的设计。我们证明了针对RESNET18上CIM加速器的幼稚分配方法的7.47 $ \ times $绩效改进。

Compute in-memory (CIM) is a promising technique that minimizes data transport, the primary performance bottleneck and energy cost of most data intensive applications. This has found wide-spread adoption in accelerating neural networks for machine learning applications. Utilizing a crossbar architecture with emerging non-volatile memories (eNVM) such as dense resistive random access memory (RRAM) or phase change random access memory (PCRAM), various forms of neural networks can be implemented to greatly reduce power and increase on chip memory capacity. However, compute in-memory faces its own limitations at both the circuit and the device levels. Although compute in-memory using the crossbar architecture can greatly reduce data transport, the rigid nature of these large fixed weight matrices forfeits the flexibility of traditional CMOS and SRAM based designs. In this work, we explore the different synchronization barriers that occur from the CIM constraints. Furthermore, we propose a new allocation algorithm and data flow based on input data distributions to maximize utilization and performance for compute-in memory based designs. We demonstrate a 7.47$\times$ performance improvement over a naive allocation method for CIM accelerators on ResNet18.

下载PDF全文

下载文献需遵守相关版权规定

论文标题