Software-Hardware Co-Optimization on Partial-Sum Problem for PIM-based Neural Network Accelerator

被引：0

作者：

Wu, Qizhe ^{[1
]}

Tao, Linfeng ^{[1
]}

Liang, Huawen ^{[1
]}

Yuan, Wei ^{[1
]}

Tian, Teng ^{[1
]}

Xue, Shuang ^{[1
]}

Jin, Xi ^{[1
]}

机构：

[1] Univ Sci & Technol China, Chinese Acad Sci, Dept Phys, State Key Lab Particle Detect & Elect,Inst Microe, Hefei 230026, Peoples R China

来源：

2021 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC) | 2021年

关键词：

processing-in-memory; partial sum; memristor; neural network accelerator;

D O I：

10.1109/HPEC49654.2021.9622798

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The crossbar architecture, which is comprised of novel mcmristor devices, enables high-speed and energy-efficient processing-in-memory (PIM) for neural network computing. However, because to the limitations of the manufacturing process, it is difficult to fabricate huge arrays. As a consequence, the neural network's vector-matrix-multiplication (VMM) must split the operands into several arrays to get the partial-sum and then add up the partial results. The neural network (NN) training process, which is often influenced by device variations and ADC quantization noise in the PIM system, does not perceive the partial-sum process. As a consequence, when inferring NN models directly on the PIM platform without taking partial-sum into account, accuracy suffers significantly. This makes it difficult to apply PIM computing to large-scale neural networks. In particular, our work makes the following contributions: (i) We conducted research on the partial-sum issue for crossbar architecture while computing high channel convolution (Cony), and got three lessons as a result. (ii) To address this issue, we offer techniques for avoiding or minimizing partial-sum at the software and hardware levels, respectively. At the software level, we utilized group Cony rather than conventional Cony; at the hardware level, we presented a new architecture for adapting dcpthwise separable Cony. Experiments were conducted using the Cifar10 dataset and the VGG8 network on RRAM crossbar architecture. Results show improvements of 15.53%, 14.55% in accuracy, and 0.28x, 0.94x in energy efficiency on software and hardware levels, respectively, when compared to the conventional PIM scheme.

引用

页数：7

共 36 条

[1] Attar: RRAM-based in-memory attention accelerator with software-hardware co-optimization
Bing LI
Ying QI
Ying WANG
Yinhe HAN
Science China(Information Sciences), 2025, 68 (03) : 371 - 387
[2] Software-Hardware Co-Optimization for CNNs Based on Reconfigurable Devices
Liu, Fang
Fan, Zimeng
He, Yanxiang
Peng, Min
19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 1279 - 1286
[3] Attar: RRAM-based in-memory attention accelerator with software-hardware co-optimization
Li, Bing
Qi, Ying
Wang, Ying
Han, Yinhe
SCIENCE CHINA-INFORMATION SCIENCES, 2025, 68 (03)
[4] Software-Hardware Co-Optimization for Computational Chemistry on Superconducting Quantum Processors
Li, Gushu
Shi, Yunong
Javadi-Abhari, Ali
2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 832 - 845
[5] Unified Hardware Software Co-Optimization for Robust Neural Network Acceleration
Rashidi, Bahador
Gao, Chao
Lu, Shan
Wang, Zhisheng
Zhou, Chunhua
Di Niu
Sun, Fengyu
56TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2023, 2023, : 77 - 90
[6] FPGAN: An FPGA Accelerator for Graph Attention Networks With Software and Hardware Co-Optimization
Yan, Weian
Tong, Weiqin
Zhi, Xiaoli
IEEE ACCESS, 2020, 8 : 171608 - 171620
[7] Surrogate Model based Co-Optimization of Deep Neural Network Hardware Accelerators
Woehrle, Hendrik
Alvarez, Mariela De Lucas
Schlenke, Fabian
Walsemann, Alexander
Karagounis, Michael
Kirchner, Frank
2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 40 - 45
[8] Hardware Accelerator and Neural Network Co-Optimization for Ultra-Low-Power Audio Processing Devices
Christoph, Gerum
Adrian, Frischknecht
Tobias, Hald
Bernardo, Paul Palomero
Lubeck, Konstantin
Oliver, Bringmann
2022 25TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2022, : 365 - 369
[9] ANNA: Accelerating Neural Network Accelerator through software-hardware co-design for vertical applications in edge systems
Li, Chuanyou
Zhang, Kun
Li, Yifan
Shang, Jiangwei
Zhang, Xinyue
Qian, Lei
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 140 : 91 - 103
[10] Convolutional Neural Network Model Compression Method for Software-Hardware Co-Design
Jang, Seojin
Liu, Wei
Cho, Yongbeom
INFORMATION, 2022, 13 (10)

← 1 2 3 4 →