Automatic Sparse Connectivity Learning for Neural Networks

被引：39

作者：

Tang, Zhimin ^{[1
,2
]}

Luo, Linkai ^{[2
]}

Xie, Bike ^{[3
]}

Zhu, Yiyu ^{[3
]}

Zhao, Rujie ^{[1
]}

Bi, Lvqing ^{[4
]}

Lu, Chao ^{[1
]}

机构：

[1] Southern Illinois Univ Carbondale, Dept Elect & Comp Engn, Carbondale, IL 62901 USA

[2] Xiamen Univ, Dept Automat, Xiamen 361102, Peoples R China

[3] Kneron Inc, San Diego, CA 92121 USA

[4] Yulin Normal Univ, Res Ctr Intelligent Informat & Commun Technol, Sch Phys & Telecommun Engn, Yulin 537000, Guangxi, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 10期

关键词：

Neural networks; Training; Hardware; Logic gates; Learning systems; Gaussian distribution; Computational modeling; Model compression; model pruning; neural networks; sparse connectivity learning (SCL); trainable binary mask;

D O I：

10.1109/TNNLS.2022.3141665

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Since sparse neural networks usually contain many zero weights, these unnecessary network connections can potentially be eliminated without degrading network performance. Therefore, well-designed sparse neural networks have the potential to significantly reduce the number of floating-point operations (FLOPs) and computational resources. In this work, we propose a new automatic pruning method--sparse connectivity learning (SCL). Specifically, a weight is reparameterized as an elementwise multiplication of a trainable weight variable and a binary mask. Thus, network connectivity is fully described by the binary mask, which is modulated by a unit step function. We theoretically prove the fundamental principle of using a straight-through estimator (STE) for network pruning. This principle is that the proxy gradients of STE should be positive, ensuring that mask variables converge at their minima. After finding Leaky ReLU, Softplus, and identity STEs can satisfy this principle, we propose to adopt identity STE in SCL for discrete mask relaxation. We find that mask gradients of different features are very unbalanced; hence, we propose to normalize mask gradients of each feature to optimize mask variable training. In order to automatically train sparse masks, we include the total number of network connections as a regularization term in our objective function. As SCL does not require pruning criteria or hyperparameters defined by designers for network layers, the network is explored in a larger hypothesis space to achieve optimized sparse connectivity for the best performance. SCL overcomes the limitations of existing automatic pruning methods. Experimental results demonstrate that SCL can automatically learn and select important network connections for various baseline network structures. Deep learning models trained by SCL outperform the state-of-the-art human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.

引用

页码：7350 / 7364

页数：15

共 50 条

[1] Characterizing Sparse Connectivity Patterns in Neural Networks
Dey, Sourya
Huang, Kuan-Wen
Beerel, Peter A.
Chugg, Keith M.
[J]. 2018 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2018,
[2] Deep Sparse Learning for Automatic Modulation Classification Using Recurrent Neural Networks
Zang, Ke
Wu, Wenqi
Luo, Wei
[J]. SENSORS, 2021, 21 (19)
[3] Sparse Neural Networks for Inference of Interwell Connectivity and Production Prediction
Yu, Junjie
Jahandideh, Atefeh
Hakim-Elahi, Siavash
Jafarpour, Behnam
[J]. SPE JOURNAL, 2021, 26 (06): : 4067 - 4088
[4] Automatic learning in chaotic neural networks
Watanabe, M
Aihara, K
Kondo, S
[J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 1996, 79 (03): : 87 - 93
[5] Learning Activation Functions for Sparse Neural Networks
Loni, Mohammad
Mohan, Aditya
Asadi, Mehdi
Lindauer, Marius
[J]. INTERNATIONAL CONFERENCE ON AUTOMATED MACHINE LEARNING, VOL 224, 2023, 224
[6] A Bregman Learning Framework for Sparse Neural Networks
Bungert, Leon
Roith, Tim
Tenbrinck, Daniel
Burger, Martin
[J]. Journal of Machine Learning Research, 2022, 23
[7] A Bregman Learning Framework for Sparse Neural Networks
Bungert, Leon
Roith, Tim
Tenbrinck, Daniel
Burger, Martin
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[8] Learning Sparse Patterns in Deep Neural Networks
Wen, Weijing
Yang, Fan
Su, Yangfeng
Zhou, Dian
Zeng, Xuan
[J]. 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ASIC (ASICON), 2019,
[9] Sparse Neural Networks with Large Learning Diversity
Gripon, Vincent
Berrou, Claude
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (07): : 1087 - 1096
[10] Dynamic Thresholding for Learning Sparse Neural Networks
Park, Jin-Woo
Lee, Jong-Seok
[J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1403 - 1410

← 1 2 3 4 5 →