Automatic Sparse Connectivity Learning for Neural Networks

被引:39
|
作者
Tang, Zhimin [1 ,2 ]
Luo, Linkai [2 ]
Xie, Bike [3 ]
Zhu, Yiyu [3 ]
Zhao, Rujie [1 ]
Bi, Lvqing [4 ]
Lu, Chao [1 ]
机构
[1] Southern Illinois Univ Carbondale, Dept Elect & Comp Engn, Carbondale, IL 62901 USA
[2] Xiamen Univ, Dept Automat, Xiamen 361102, Peoples R China
[3] Kneron Inc, San Diego, CA 92121 USA
[4] Yulin Normal Univ, Res Ctr Intelligent Informat & Commun Technol, Sch Phys & Telecommun Engn, Yulin 537000, Guangxi, Peoples R China
关键词
Neural networks; Training; Hardware; Logic gates; Learning systems; Gaussian distribution; Computational modeling; Model compression; model pruning; neural networks; sparse connectivity learning (SCL); trainable binary mask;
D O I
10.1109/TNNLS.2022.3141665
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since sparse neural networks usually contain many zero weights, these unnecessary network connections can potentially be eliminated without degrading network performance. Therefore, well-designed sparse neural networks have the potential to significantly reduce the number of floating-point operations (FLOPs) and computational resources. In this work, we propose a new automatic pruning method--sparse connectivity learning (SCL). Specifically, a weight is reparameterized as an elementwise multiplication of a trainable weight variable and a binary mask. Thus, network connectivity is fully described by the binary mask, which is modulated by a unit step function. We theoretically prove the fundamental principle of using a straight-through estimator (STE) for network pruning. This principle is that the proxy gradients of STE should be positive, ensuring that mask variables converge at their minima. After finding Leaky ReLU, Softplus, and identity STEs can satisfy this principle, we propose to adopt identity STE in SCL for discrete mask relaxation. We find that mask gradients of different features are very unbalanced; hence, we propose to normalize mask gradients of each feature to optimize mask variable training. In order to automatically train sparse masks, we include the total number of network connections as a regularization term in our objective function. As SCL does not require pruning criteria or hyperparameters defined by designers for network layers, the network is explored in a larger hypothesis space to achieve optimized sparse connectivity for the best performance. SCL overcomes the limitations of existing automatic pruning methods. Experimental results demonstrate that SCL can automatically learn and select important network connections for various baseline network structures. Deep learning models trained by SCL outperform the state-of-the-art human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.
引用
收藏
页码:7350 / 7364
页数:15
相关论文
共 50 条
  • [1] Characterizing Sparse Connectivity Patterns in Neural Networks
    Dey, Sourya
    Huang, Kuan-Wen
    Beerel, Peter A.
    Chugg, Keith M.
    [J]. 2018 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2018,
  • [2] Deep Sparse Learning for Automatic Modulation Classification Using Recurrent Neural Networks
    Zang, Ke
    Wu, Wenqi
    Luo, Wei
    [J]. SENSORS, 2021, 21 (19)
  • [3] Sparse Neural Networks for Inference of Interwell Connectivity and Production Prediction
    Yu, Junjie
    Jahandideh, Atefeh
    Hakim-Elahi, Siavash
    Jafarpour, Behnam
    [J]. SPE JOURNAL, 2021, 26 (06): : 4067 - 4088
  • [4] Automatic learning in chaotic neural networks
    Watanabe, M
    Aihara, K
    Kondo, S
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 1996, 79 (03): : 87 - 93
  • [5] Learning Activation Functions for Sparse Neural Networks
    Loni, Mohammad
    Mohan, Aditya
    Asadi, Mehdi
    Lindauer, Marius
    [J]. INTERNATIONAL CONFERENCE ON AUTOMATED MACHINE LEARNING, VOL 224, 2023, 224
  • [6] A Bregman Learning Framework for Sparse Neural Networks
    Bungert, Leon
    Roith, Tim
    Tenbrinck, Daniel
    Burger, Martin
    [J]. Journal of Machine Learning Research, 2022, 23
  • [7] A Bregman Learning Framework for Sparse Neural Networks
    Bungert, Leon
    Roith, Tim
    Tenbrinck, Daniel
    Burger, Martin
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [8] Learning Sparse Patterns in Deep Neural Networks
    Wen, Weijing
    Yang, Fan
    Su, Yangfeng
    Zhou, Dian
    Zeng, Xuan
    [J]. 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ASIC (ASICON), 2019,
  • [9] Sparse Neural Networks with Large Learning Diversity
    Gripon, Vincent
    Berrou, Claude
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (07): : 1087 - 1096
  • [10] Dynamic Thresholding for Learning Sparse Neural Networks
    Park, Jin-Woo
    Lee, Jong-Seok
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1403 - 1410