ERA-BS: Boosting the Efficiency of ReRAM-Based PIM Accelerator With Fine-Grained Bit-Level Sparsity

被引：1

作者：

Liu, Fangxin ^{[1
,2
]}

Zhao, Wenbo ^{[3
]}

Wang, Zongwu ^{[1
]}

Chen, Yongbiao ^{[1
]}

Liang, Xiaoyao ^{[1
]}

Jiang, Li ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China

[2] Shanghai Qi Zhi Inst, Shanghai 200232, Peoples R China

[3] Shanghai Jiao Tong Univ, Univ Michigan Shanghai Jiao Tong Univ Joint Inst, Shanghai 200240, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2024年 / 73卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Processing-in-memory; neural network; hardware accelerator; bit-level sparsity;

D O I：

10.1109/TC.2023.3290869

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Resistive Random-Access-Memory (ReRAM) crossbar is one of the most promising neural network accelerators, thanks to its in-memory and in-situ analog computing abilities for Matrix Multiplication-and-Accumulations (MACs). The key limitations are: 1) the number of rows and columns of ReRAM cells for concurrent execution of MACs is constrained, resulting in limited in-memory computing throughput; 2) the cost of high-precision analog-to-digital (A/D) conversions that can offset the efficiency and performance benefits of ReRAM-based Process-In-Memory (PIM). Meanwhile, it is challenging to deploy Deep Neural Network (DNN) models with a large model size in the crossbar since the sparsity of DNNs cannot be effectively exploited in the crossbar structure, especially the sparsity in the activation. As a countermeasure, we develop a novel ReRAM-based PIM accelerator, namely ERA-BS, which pays attention to the correlation between the bit-level sparsity (in both weights and activations) and the performance of the ReRAM-based crossbar. We propose a superior bit-flip scheme combined with the exponent-based quantization, which can adaptively flip the bits of the mapped DNNs to release redundant space without sacrificing the accuracy much or incurring much hardware overhead. Meanwhile, we design an architecture that can integrate the techniques to shrink the crossbar footprint to be used massively. We further propose a dynamic activation sparsity exploitation scheme in conjunction with the tightly coupled structure nature of the crossbar, including crossbar-aware activation pruning and ancillary run-time hardware support. In such a way, we exploit fine-grained sparsity weights (static) and activations (dynamic), respectively, to improve performance while reducing the energy consumption of computation with negligible overheads. Our experiments on a wide variety of networks show that compared to the well-known ReRAM-based PIM accelerator like "ISAAC", ERA-BS can achieve up to 43x, 78x, and 73x in terms of energy efficiency, area-efficiency, and throughput, respectively. Compared to the state-of-the-art ReRAM-based design "PIM-Prune", ERA-BS can also achieve 5.3x energy efficiency, 7.2x area efficiency, and 32x performance gain with a similar or even higher accuracy.

引用

页码：2320 / 2334

页数：15

共 8 条

[1] Bit-Transformer: Transforming Bit-level Sparsity into Higher Preformance in ReRAM-based Accelerator
Liu, Fangxin
Zhao, Wenbo
He, Zhezhi
Wang, Zongwu
Zhao, Yilong
Chen, Yongbiao
Jiang, Li
2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,
[2] ReRAM-Sharing: Fine-Grained Weight Sharing for ReRAM-Based Deep Neural Network Accelerator
Song, Zhuoran
Li, Dongyue
He, Zhezhi
Liang, Xiaoyao
Jiang, Li
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
[3] FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator
Yuan, Geng
Behnam, Payman
Li, Zhengang
Shafiee, Ali
Lin, Sheng
Ma, Xiaolong
Liu, Hang
Qian, Xuehai
Bojnordi, Mahdi Nazm
Wang, Yanzhi
Ding, Caiwen
2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 265 - 278
[4] A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern
Yang, Tao
Liao, Yunkun
Shi, Jianping
Liang, Yun
Jing, Naifeng
Jiang, Li
2020 30TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2020, : 254 - 261
[5] A multiplier-Free RNS-Based CNN accelerator exploiting bit-Level sparsity
Sakellariou, Vasilis
Paliouras, Vassilis
Kouretas, Ioannis
Saleh, Hani
Stouraitis, Thanos
2023 IEEE 30TH SYMPOSIUM ON COMPUTER ARITHMETIC, ARITH 2023, 2023, : 101 - 101
[6] A Multiplier-Free RNS-Based CNN Accelerator Exploiting Bit-Level Sparsity
Sakellariou, Vasilis
Paliouras, Vassilis
Kouretas, Ioannis
Saleh, Hani
Stouraitis, Thanos
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2024, 12 (02) : 667 - 683
[7] BISWSRBS: AWinograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed PrecisionQuantization
Yang, Tao
He, Zhezhi
Kou, Tengchuan
Li, Qingzheng
Han, Qi
Yu, Haibao
Liu, Fangxin
Liang, Yun
Jiang, Li
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2021, 14 (04)
[8] S-FLASH: A NAND Flash-Based Deep Neural Network Accelerator Exploiting Bit-Level Sparsity
Kang, Myeonggu
Kim, Hyeonuk
Shin, Hyein
Sim, Jaehyeong
Kim, Kyeonghan
Kim, Lee-Sup
IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (06) : 1291 - 1304

← 1 →