Partition Pruning: Parallelization-Aware Pruning for Dense Neural Networks

被引：0

作者：

Shahhosseini, Sina ^{[1
]}

Albaqsami, Ahmad ^{[1
]}

Jasemi, Masoomeh ^{[1
,2
]}

Bagherzadeh, Nader ^{[1
]}

机构：

[1] Univ Calif Irvine, Elect Engn & Comp Sci Dept, Irvine, CA 92697 USA

[2] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran

来源：

2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020) | 2020年

关键词：

Parallelization; Deep Neural Network; Pruning; Partitioning; Hardware Accelerator;

D O I：

10.1109/PDP50117.2020.00053

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

As recent neural networks are being improved to be more accurate, their model's size is exponentially growing. Thus, a huge number of parameters requires to be loaded and stored from/in memory hierarchy and computed in processors to perform training or inference phase of neural network processing. Increasing the number of parameters causes a big challenge for real-time deployment since the memory bandwidth improvement's trend cannot keep up with models' complexity growing trend. Although some operations in neural networks processing are computational intensive such as convolutional layer computing, computing dense layers face with memory bandwidth bottleneck. To address the issue, the paper has proposed Partition Pruning for dense layers to reduce the required parameters while taking into consideration parallelization. We evaluated the performance and energy consumption of parallel inference of partitioned models, which showed a 7.72x speedup of performance and a 2.73x reduction in the energy used for computing pruned fully connected layers in TinyVGG16 model in comparison to running the unpruned model on a single accelerator. Besides, our method showed a limited reduction in accuracy while partitioning fully connected layers.

引用

下载

页码：307 / 311

页数：5

共 50 条

[1] Accelerator-Aware Pruning for Convolutional Neural Networks
Kang, Hyeong-Ju
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (07) : 2093 - 2103
[2] Reconstruction Error Aware Pruning for Accelerating Neural Networks
Kamma, Koji
Wada, Toshikazu
ADVANCES IN VISUAL COMPUTING, ISVC 2019, PT I, 2020, 11844 : 59 - 72
[3] Redundancy-Aware Pruning of Convolutional Neural Networks
Xie, Guotian
NEURAL COMPUTATION, 2020, 32 (12) : 2482 - 2506
[4] Structured Pruning of Neural Networks with Budget-Aware Regularization
Lemaire, Carl
Achkar, Andrew
Jodoin, Pierre-Marc
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9100 - 9108
[5] Discrimination-aware Channel Pruning for Deep Neural Networks
Zhuang, Zhuangwei
Tan, Mingkui
Zhuang, Bohan
Liu, Jing
Guo, Yong
Wu, Qingyao
Huang, Junzhou
Zhu, Jinhui
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[6] Optimal pruning in neural networks
Barbato, DML
Kinouchi, O
PHYSICAL REVIEW E, 2000, 62 (06): : 8387 - 8394
[7] Intermittent-Aware Neural Network Pruning
Lin, Chih-Chia
Liu, Chia-Yin
Yen, Chih-Hsuan
Kuo, Tei-Wei
Hsiu, Pi-Cheng
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[8] Automatic Pruning Rate Derivation for Structured Pruning of Deep Neural Networks
Sakai, Yasufumi
Iwakawa, Akinori
Tabaru, Tsuguchika
Inoue, Atsuki
Kawaguchi, Hiroshi
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2561 - 2567
[9] Crossbar-Aware Neural Network Pruning
Liang, Ling
Deng, Lei
Zeng, Yueling
Hu, Xing
Ji, Yu
Ma, Xin
Li, Guoqi
Xie, Yuan
IEEE ACCESS, 2018, 6 : 58324 - 58337
[10] BitXpro: Regularity-Aware Hardware Runtime Pruning for Deep Neural Networks
Li, Hongyan
Lu, Hang
Wang, Haoxuan
Deng, Shengji
Li, Xiaowei
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (01) : 90 - 103

← 1 2 3 4 5 →