Software-Hardware Co-Optimization on Partial-Sum Problem for PIM-based Neural Network Accelerator

被引:0
|
作者
Wu, Qizhe [1 ]
Tao, Linfeng [1 ]
Liang, Huawen [1 ]
Yuan, Wei [1 ]
Tian, Teng [1 ]
Xue, Shuang [1 ]
Jin, Xi [1 ]
机构
[1] Univ Sci & Technol China, Chinese Acad Sci, Dept Phys, State Key Lab Particle Detect & Elect,Inst Microe, Hefei 230026, Peoples R China
关键词
processing-in-memory; partial sum; memristor; neural network accelerator;
D O I
10.1109/HPEC49654.2021.9622798
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The crossbar architecture, which is comprised of novel mcmristor devices, enables high-speed and energy-efficient processing-in-memory (PIM) for neural network computing. However, because to the limitations of the manufacturing process, it is difficult to fabricate huge arrays. As a consequence, the neural network's vector-matrix-multiplication (VMM) must split the operands into several arrays to get the partial-sum and then add up the partial results. The neural network (NN) training process, which is often influenced by device variations and ADC quantization noise in the PIM system, does not perceive the partial-sum process. As a consequence, when inferring NN models directly on the PIM platform without taking partial-sum into account, accuracy suffers significantly. This makes it difficult to apply PIM computing to large-scale neural networks. In particular, our work makes the following contributions: (i) We conducted research on the partial-sum issue for crossbar architecture while computing high channel convolution (Cony), and got three lessons as a result. (ii) To address this issue, we offer techniques for avoiding or minimizing partial-sum at the software and hardware levels, respectively. At the software level, we utilized group Cony rather than conventional Cony; at the hardware level, we presented a new architecture for adapting dcpthwise separable Cony. Experiments were conducted using the Cifar10 dataset and the VGG8 network on RRAM crossbar architecture. Results show improvements of 15.53%, 14.55% in accuracy, and 0.28x, 0.94x in energy efficiency on software and hardware levels, respectively, when compared to the conventional PIM scheme.
引用
收藏
页数:7
相关论文
共 36 条
  • [21] Hardware and Software Co-optimization of Convolutional and Self-attention Combined Model Based on FPGA
    Hu, Wei
    Li, Heyuan
    Liu, Fang
    Zhong, Zhiyv
    WEB AND BIG DATA, PT III, APWEB-WAIM 2023, 2024, 14333 : 328 - 342
  • [22] HSCONN: Hardware-Software Co-Optimization of Self-Attention Neural Networks for Large Language Models
    Liu, Siqin
    Kuve, Prakash Chand
    Karanth, Avinash
    PROCEEDING OF THE GREAT LAKES SYMPOSIUM ON VLSI 2024, GLSVLSI 2024, 2024, : 736 - 741
  • [23] Hardware and Software Co-optimization for the Initialization Failure of the ReRAM-based Cross-bar Array
    Kim, Youngseok
    Kim, Seyoung
    Yeh, Chun-Chen
    Narayanan, Vijay
    Choi, Jungwook
    ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2020, 16 (04)
  • [24] Multi-Objective Surrogate-Model-Based Neural Architecture and Physical Design Co-Optimization of Energy Efficient Neural Network Hardware Accelerators
    Wohrle, Hendrik
    Schneider, Felix
    Schlenke, Fabian
    Lebold, Denis
    Alvarez, Mariela De Lucas
    Kirchner, Frank
    Karagounis, Michael
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (01) : 40 - 53
  • [25] Hardware/Software Co-design for a Neural Network Trained by Particle Swarm Optimization Algorithm
    Tuan Linh Dang
    Hoshino, Yukinobu
    NEURAL PROCESSING LETTERS, 2019, 49 (02) : 481 - 505
  • [26] Hardware/Software Co-design for a Neural Network Trained by Particle Swarm Optimization Algorithm
    Tuan Linh Dang
    Yukinobu Hoshino
    Neural Processing Letters, 2019, 49 : 481 - 505
  • [27] Relation network inference optimization method based on software and hardware co-acceleration
    Zhang Z.
    Wang J.
    Zhang L.
    Xiao J.
    High Technology Letters, 2022, 32 (04) : 327 - 336
  • [28] EIT-MP: A 2-D Electrical Impedance Tomography Image Reconstruction Method Based on Mixed Precision Asymmetrical Neural Network for Hardware-Software Co-Optimization Platform
    Huang, Jiajie
    Guo, Qianyu
    Zhang, Yunxiang
    Lu, Wangzilu
    Wang, Chao
    Zhang, Wenkai
    Liu, Wentao
    Zhao, Jian
    Li, Yongfu
    IEEE SENSORS JOURNAL, 2024, 24 (23) : 39947 - 39957
  • [29] A genetic algorithm based approach for multi-objective hardware/software co-optimization (vol 10, pg 36, 2016)
    Banerjee, Tania
    Gadou, Mohamed
    Ranka, Sanjay
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2016, 12 : 55 - 55
  • [30] Graph neural network based cell library characterization method for fast design technology co-optimization
    Ma, Tianliang
    Fan, Guangxi
    Sun, Xuguang
    Low, Kain Lu
    Shao, Leilai
    INTEGRATION-THE VLSI JOURNAL, 2025, 101