Partition Pruning: Parallelization-Aware Pruning for Dense Neural Networks

被引:0
|
作者
Shahhosseini, Sina [1 ]
Albaqsami, Ahmad [1 ]
Jasemi, Masoomeh [1 ,2 ]
Bagherzadeh, Nader [1 ]
机构
[1] Univ Calif Irvine, Elect Engn & Comp Sci Dept, Irvine, CA 92697 USA
[2] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran
关键词
Parallelization; Deep Neural Network; Pruning; Partitioning; Hardware Accelerator;
D O I
10.1109/PDP50117.2020.00053
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As recent neural networks are being improved to be more accurate, their model's size is exponentially growing. Thus, a huge number of parameters requires to be loaded and stored from/in memory hierarchy and computed in processors to perform training or inference phase of neural network processing. Increasing the number of parameters causes a big challenge for real-time deployment since the memory bandwidth improvement's trend cannot keep up with models' complexity growing trend. Although some operations in neural networks processing are computational intensive such as convolutional layer computing, computing dense layers face with memory bandwidth bottleneck. To address the issue, the paper has proposed Partition Pruning for dense layers to reduce the required parameters while taking into consideration parallelization. We evaluated the performance and energy consumption of parallel inference of partitioned models, which showed a 7.72x speedup of performance and a 2.73x reduction in the energy used for computing pruned fully connected layers in TinyVGG16 model in comparison to running the unpruned model on a single accelerator. Besides, our method showed a limited reduction in accuracy while partitioning fully connected layers.
引用
收藏
页码:307 / 311
页数:5
相关论文
共 50 条
  • [2] Reconstruction Error Aware Pruning for Accelerating Neural Networks
    Kamma, Koji
    Wada, Toshikazu
    [J]. ADVANCES IN VISUAL COMPUTING, ISVC 2019, PT I, 2020, 11844 : 59 - 72
  • [3] Redundancy-Aware Pruning of Convolutional Neural Networks
    Xie, Guotian
    [J]. NEURAL COMPUTATION, 2020, 32 (12) : 2482 - 2506
  • [4] Structured Pruning of Neural Networks with Budget-Aware Regularization
    Lemaire, Carl
    Achkar, Andrew
    Jodoin, Pierre-Marc
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9100 - 9108
  • [5] Discrimination-aware Channel Pruning for Deep Neural Networks
    Zhuang, Zhuangwei
    Tan, Mingkui
    Zhuang, Bohan
    Liu, Jing
    Guo, Yong
    Wu, Qingyao
    Huang, Junzhou
    Zhu, Jinhui
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [6] Optimal pruning in neural networks
    Barbato, DML
    Kinouchi, O
    [J]. PHYSICAL REVIEW E, 2000, 62 (06): : 8387 - 8394
  • [7] Intermittent-Aware Neural Network Pruning
    Lin, Chih-Chia
    Liu, Chia-Yin
    Yen, Chih-Hsuan
    Kuo, Tei-Wei
    Hsiu, Pi-Cheng
    [J]. 2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [8] Automatic Pruning Rate Derivation for Structured Pruning of Deep Neural Networks
    Sakai, Yasufumi
    Iwakawa, Akinori
    Tabaru, Tsuguchika
    Inoue, Atsuki
    Kawaguchi, Hiroshi
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2561 - 2567
  • [9] Crossbar-Aware Neural Network Pruning
    Liang, Ling
    Deng, Lei
    Zeng, Yueling
    Hu, Xing
    Ji, Yu
    Ma, Xin
    Li, Guoqi
    Xie, Yuan
    [J]. IEEE ACCESS, 2018, 6 : 58324 - 58337
  • [10] BitXpro: Regularity-Aware Hardware Runtime Pruning for Deep Neural Networks
    Li, Hongyan
    Lu, Hang
    Wang, Haoxuan
    Deng, Shengji
    Li, Xiaowei
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (01) : 90 - 103