ERA-BS: Boosting the Efficiency of ReRAM-Based PIM Accelerator With Fine-Grained Bit-Level Sparsity

被引:1
|
作者
Liu, Fangxin [1 ,2 ]
Zhao, Wenbo [3 ]
Wang, Zongwu [1 ]
Chen, Yongbiao [1 ]
Liang, Xiaoyao [1 ]
Jiang, Li [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[2] Shanghai Qi Zhi Inst, Shanghai 200232, Peoples R China
[3] Shanghai Jiao Tong Univ, Univ Michigan Shanghai Jiao Tong Univ Joint Inst, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Processing-in-memory; neural network; hardware accelerator; bit-level sparsity;
D O I
10.1109/TC.2023.3290869
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Resistive Random-Access-Memory (ReRAM) crossbar is one of the most promising neural network accelerators, thanks to its in-memory and in-situ analog computing abilities for Matrix Multiplication-and-Accumulations (MACs). The key limitations are: 1) the number of rows and columns of ReRAM cells for concurrent execution of MACs is constrained, resulting in limited in-memory computing throughput; 2) the cost of high-precision analog-to-digital (A/D) conversions that can offset the efficiency and performance benefits of ReRAM-based Process-In-Memory (PIM). Meanwhile, it is challenging to deploy Deep Neural Network (DNN) models with a large model size in the crossbar since the sparsity of DNNs cannot be effectively exploited in the crossbar structure, especially the sparsity in the activation. As a countermeasure, we develop a novel ReRAM-based PIM accelerator, namely ERA-BS, which pays attention to the correlation between the bit-level sparsity (in both weights and activations) and the performance of the ReRAM-based crossbar. We propose a superior bit-flip scheme combined with the exponent-based quantization, which can adaptively flip the bits of the mapped DNNs to release redundant space without sacrificing the accuracy much or incurring much hardware overhead. Meanwhile, we design an architecture that can integrate the techniques to shrink the crossbar footprint to be used massively. We further propose a dynamic activation sparsity exploitation scheme in conjunction with the tightly coupled structure nature of the crossbar, including crossbar-aware activation pruning and ancillary run-time hardware support. In such a way, we exploit fine-grained sparsity weights (static) and activations (dynamic), respectively, to improve performance while reducing the energy consumption of computation with negligible overheads. Our experiments on a wide variety of networks show that compared to the well-known ReRAM-based PIM accelerator like "ISAAC", ERA-BS can achieve up to 43x, 78x, and 73x in terms of energy efficiency, area-efficiency, and throughput, respectively. Compared to the state-of-the-art ReRAM-based design "PIM-Prune", ERA-BS can also achieve 5.3x energy efficiency, 7.2x area efficiency, and 32x performance gain with a similar or even higher accuracy.
引用
收藏
页码:2320 / 2334
页数:15
相关论文
共 8 条
  • [1] Bit-Transformer: Transforming Bit-level Sparsity into Higher Preformance in ReRAM-based Accelerator
    Liu, Fangxin
    Zhao, Wenbo
    He, Zhezhi
    Wang, Zongwu
    Zhao, Yilong
    Chen, Yongbiao
    Jiang, Li
    2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,
  • [2] ReRAM-Sharing: Fine-Grained Weight Sharing for ReRAM-Based Deep Neural Network Accelerator
    Song, Zhuoran
    Li, Dongyue
    He, Zhezhi
    Liang, Xiaoyao
    Jiang, Li
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [3] FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator
    Yuan, Geng
    Behnam, Payman
    Li, Zhengang
    Shafiee, Ali
    Lin, Sheng
    Ma, Xiaolong
    Liu, Hang
    Qian, Xuehai
    Bojnordi, Mahdi Nazm
    Wang, Yanzhi
    Ding, Caiwen
    2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 265 - 278
  • [4] A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern
    Yang, Tao
    Liao, Yunkun
    Shi, Jianping
    Liang, Yun
    Jing, Naifeng
    Jiang, Li
    2020 30TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2020, : 254 - 261
  • [5] A multiplier-Free RNS-Based CNN accelerator exploiting bit-Level sparsity
    Sakellariou, Vasilis
    Paliouras, Vassilis
    Kouretas, Ioannis
    Saleh, Hani
    Stouraitis, Thanos
    2023 IEEE 30TH SYMPOSIUM ON COMPUTER ARITHMETIC, ARITH 2023, 2023, : 101 - 101
  • [6] A Multiplier-Free RNS-Based CNN Accelerator Exploiting Bit-Level Sparsity
    Sakellariou, Vasilis
    Paliouras, Vassilis
    Kouretas, Ioannis
    Saleh, Hani
    Stouraitis, Thanos
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2024, 12 (02) : 667 - 683
  • [7] BISWSRBS: AWinograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed PrecisionQuantization
    Yang, Tao
    He, Zhezhi
    Kou, Tengchuan
    Li, Qingzheng
    Han, Qi
    Yu, Haibao
    Liu, Fangxin
    Liang, Yun
    Jiang, Li
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2021, 14 (04)
  • [8] S-FLASH: A NAND Flash-Based Deep Neural Network Accelerator Exploiting Bit-Level Sparsity
    Kang, Myeonggu
    Kim, Hyeonuk
    Shin, Hyein
    Sim, Jaehyeong
    Kim, Kyeonghan
    Kim, Lee-Sup
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (06) : 1291 - 1304