Non-Structured DNN Weight Pruning--Is It Beneficial in Any Platform?

被引:42
|
作者
Ma, Xiaolong [1 ]
Lin, Sheng [1 ]
Ye, Shaokai [2 ]
He, Zhezhi [4 ]
Zhang, Linfeng [3 ]
Yuan, Geng [1 ]
Tan, Sia Huat [3 ]
Li, Zhengang [1 ]
Fan, Deliang [5 ]
Qian, Xuehai [6 ]
Lin, Xue [1 ]
Ma, Kaisheng [3 ]
Wang, Yanzhi [1 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
[2] Swiss Fed Inst Technol Lausanne, Ctr Neuroprosthet, CH-1015 Lausanne, Switzerland
[3] Tsinghua Univ, Inst Interdisciplinary Informat Sci IIIS, Beijing 100084, Peoples R China
[4] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[5] Arizona State Univ, Dept Elect Comp & Energy Engn, Tempe, AZ 85287 USA
[6] Univ Southern Calif, Ming Hsieh Dept Elect Engn, Los Angeles, CA 90007 USA
基金
美国国家科学基金会;
关键词
Quantization (signal); Redundancy; Computational modeling; Acceleration; Degradation; Random access memory; Indexes; Deep neural network (DNN); hardware acceleration; quantization; weight pruning;
D O I
10.1109/TNNLS.2021.3063265
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model compression with two main approaches. Weight pruning leverages the redundancy in the number of weights and can be performed in a non-structured, which has higher flexibility and pruning rate but incurs index accesses due to irregular weights, or structured manner, which preserves the full matrix structure with a lower pruning rate. Weight quantization leverages the redundancy in the number of bits in weights. Compared to pruning, quantization is much more hardware-friendly and has become a ``must-do'' step for FPGA and ASIC implementations. Thus, any evaluation of the effectiveness of pruning should be on top of quantization. The key open question is, with quantization, what kind of pruning (non-structured versus structured) is most beneficial? This question is fundamental because the answer will determine the design aspects that we should really focus on to avoid the diminishing return of certain optimizations. This article provides a definitive answer to the question for the first time. First, we build ADMM-NN-S by extending and enhancing ADMM-NN, a recently proposed joint weight pruning and quantization framework, with the algorithmic supports for structured pruning, dynamic ADMM regulation, and masked mapping and retraining. Second, we develop a methodology for fair and fundamental comparison of non-structured and structured pruning in terms of both storage and computation efficiency. Our results show that ADMM-NN-S consistently outperforms the prior art: 1) it achieves 348x, 36x, and 8x overall weight pruning on LeNet-5, AlexNet, and ResNet-50, respectively, with (almost) zero accuracy loss and 2) we demonstrate the first fully binarized (for all layers) DNNs can be lossless in accuracy in many cases. These results provide a strong baseline and credibility of our study. Based on the proposed comparison framework, with the same accuracy and quantization, the results show that non-structured pruning is not competitive in terms of both storage and computation efficiency. Thus, we conclude that structured pruning has a greater potential compared to non-structured pruning. We encourage the community to focus on studying the DNN inference acceleration with structured sparsity.
引用
收藏
页码:4930 / 4944
页数:15
相关论文
共 7 条
  • [1] Non-structured Pruning for Deep-learning based Steganalytic Frameworks
    Li, Qiushi
    Shao, Zilong
    Tan, Shunquan
    Zeng, Jishen
    Li, Bin
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1735 - 1739
  • [2] Hierarchical Non-Structured Pruning for Computing-In-Memory Accelerators with Reduced ADC Resolution Requirement
    Xue, Wenlu
    Bai, Jinyu
    Sun, Sifan
    Kang, Wang
    [J]. 2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [3] An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM
    Yuan, Geng
    Ma, Xiaolong
    Ding, Caiwen
    Lin, Sheng
    Zhang, Tianyun
    Jalali, Zeinab S.
    Zhao, Yilong
    Jiang, Li
    Soundarajan, Sucheta
    Wang, Yanzhi
    [J]. 2019 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN (ISLPED), 2019,
  • [4] Dietary prescription adherence and non-structured physical activity following weight loss with and without aerobic exercise
    Serra, M. C.
    Treuth, M. S.
    Ryan, A. S.
    [J]. JOURNAL OF NUTRITION HEALTH & AGING, 2014, 18 (10): : 888 - 893
  • [5] Dietary prescription adherence and non-structured physical activity following weight loss with and without aerobic exercise
    M. C. Serra
    M. S. Treuth
    A. S. Ryan
    [J]. The journal of nutrition, health & aging, 2014, 18 : 888 - 893
  • [6] A systematic review of structured compared with non-structured breastfeeding programmes to support the initiation and duration of exclusive and any breastfeeding in acute and primary health care settings
    Beake, Sarah
    Pellowe, Carol
    Dykes, Fiona
    Schmied, Virginia
    Bick, Debra
    [J]. MATERNAL AND CHILD NUTRITION, 2012, 8 (02): : 141 - 161
  • [7] SWPU: A 126.04 TFLOPS/W Edge-Device Sparse DNN Training Processor With Dynamic Sub-Structured Weight Pruning
    Wang, Yang
    Qin, Yubin
    Liu, Leibo
    Wei, Shaojun
    Yin, Shouyi
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (10) : 4014 - 4027