Software-Hardware Co-Optimization on Partial-Sum Problem for PIM-based Neural Network Accelerator

被引:0
|
作者
Wu, Qizhe [1 ]
Tao, Linfeng [1 ]
Liang, Huawen [1 ]
Yuan, Wei [1 ]
Tian, Teng [1 ]
Xue, Shuang [1 ]
Jin, Xi [1 ]
机构
[1] Univ Sci & Technol China, Chinese Acad Sci, Dept Phys, State Key Lab Particle Detect & Elect,Inst Microe, Hefei 230026, Peoples R China
关键词
processing-in-memory; partial sum; memristor; neural network accelerator;
D O I
10.1109/HPEC49654.2021.9622798
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The crossbar architecture, which is comprised of novel mcmristor devices, enables high-speed and energy-efficient processing-in-memory (PIM) for neural network computing. However, because to the limitations of the manufacturing process, it is difficult to fabricate huge arrays. As a consequence, the neural network's vector-matrix-multiplication (VMM) must split the operands into several arrays to get the partial-sum and then add up the partial results. The neural network (NN) training process, which is often influenced by device variations and ADC quantization noise in the PIM system, does not perceive the partial-sum process. As a consequence, when inferring NN models directly on the PIM platform without taking partial-sum into account, accuracy suffers significantly. This makes it difficult to apply PIM computing to large-scale neural networks. In particular, our work makes the following contributions: (i) We conducted research on the partial-sum issue for crossbar architecture while computing high channel convolution (Cony), and got three lessons as a result. (ii) To address this issue, we offer techniques for avoiding or minimizing partial-sum at the software and hardware levels, respectively. At the software level, we utilized group Cony rather than conventional Cony; at the hardware level, we presented a new architecture for adapting dcpthwise separable Cony. Experiments were conducted using the Cifar10 dataset and the VGG8 network on RRAM crossbar architecture. Results show improvements of 15.53%, 14.55% in accuracy, and 0.28x, 0.94x in energy efficiency on software and hardware levels, respectively, when compared to the conventional PIM scheme.
引用
收藏
页数:7
相关论文
共 36 条
  • [1] Attar: RRAM-based in-memory attention accelerator with software-hardware co-optimization
    Bing LI
    Ying QI
    Ying WANG
    Yinhe HAN
    Science China(Information Sciences), 2025, 68 (03) : 371 - 387
  • [2] Software-Hardware Co-Optimization for CNNs Based on Reconfigurable Devices
    Liu, Fang
    Fan, Zimeng
    He, Yanxiang
    Peng, Min
    19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 1279 - 1286
  • [3] Attar: RRAM-based in-memory attention accelerator with software-hardware co-optimization
    Li, Bing
    Qi, Ying
    Wang, Ying
    Han, Yinhe
    SCIENCE CHINA-INFORMATION SCIENCES, 2025, 68 (03)
  • [4] Software-Hardware Co-Optimization for Computational Chemistry on Superconducting Quantum Processors
    Li, Gushu
    Shi, Yunong
    Javadi-Abhari, Ali
    2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 832 - 845
  • [5] Unified Hardware Software Co-Optimization for Robust Neural Network Acceleration
    Rashidi, Bahador
    Gao, Chao
    Lu, Shan
    Wang, Zhisheng
    Zhou, Chunhua
    Di Niu
    Sun, Fengyu
    56TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2023, 2023, : 77 - 90
  • [6] FPGAN: An FPGA Accelerator for Graph Attention Networks With Software and Hardware Co-Optimization
    Yan, Weian
    Tong, Weiqin
    Zhi, Xiaoli
    IEEE ACCESS, 2020, 8 : 171608 - 171620
  • [7] Surrogate Model based Co-Optimization of Deep Neural Network Hardware Accelerators
    Woehrle, Hendrik
    Alvarez, Mariela De Lucas
    Schlenke, Fabian
    Walsemann, Alexander
    Karagounis, Michael
    Kirchner, Frank
    2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 40 - 45
  • [8] Hardware Accelerator and Neural Network Co-Optimization for Ultra-Low-Power Audio Processing Devices
    Christoph, Gerum
    Adrian, Frischknecht
    Tobias, Hald
    Bernardo, Paul Palomero
    Lubeck, Konstantin
    Oliver, Bringmann
    2022 25TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2022, : 365 - 369
  • [9] ANNA: Accelerating Neural Network Accelerator through software-hardware co-design for vertical applications in edge systems
    Li, Chuanyou
    Zhang, Kun
    Li, Yifan
    Shang, Jiangwei
    Zhang, Xinyue
    Qian, Lei
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 140 : 91 - 103
  • [10] Convolutional Neural Network Model Compression Method for Software-Hardware Co-Design
    Jang, Seojin
    Liu, Wei
    Cho, Yongbeom
    INFORMATION, 2022, 13 (10)