Training Compact DNNs with l1/2 Regularization

被引:2
|
作者
Tang, Anda [1 ]
Niu, Lingfeng [2 ,3 ]
Miao, Jianyu [4 ]
Zhang, Peng [5 ]
机构
[1] Univ Chinese Acad Sci, Sch Math Sci, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Res Ctr Fictitious Econ & Data Sci, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Econ & Management, Beijing 100190, Peoples R China
[4] Henan Univ Technol, Sch Artificial Intelligence & Big Data, Zhengzhou 450001, Peoples R China
[5] Guangzhou Univ, Cyberspace Inst Adv Technol, Guangzhou 511442, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep neural networks; Model compression; 1; 2; Quasi-norm; Non-Lipschitz regularization; Sparse optimization; L-1/2; REGULARIZATION; VARIABLE SELECTION; NEURAL-NETWORKS; REPRESENTATION; MINIMIZATION; DROPOUT; MODEL;
D O I
10.1016/j.patcog.2022.109206
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural network(DNN) has achieved unprecedented success in many fields. However, its large model parameters which bring a great burden on storage and calculation hinder the development and appli-cation of DNNs. It is worthy of compressing the model to reduce the complexity of the DNN. Sparsity -inducing regularizer is one of the most common tools for compression. In this paper, we propose utilizing the pound 1 / 2 quasi-norm to zero out weights of neural networks and compressing the networks automatically during the learning process. To our knowledge, it is the first work applying the non-Lipschitz contin-uous regularizer for the compression of DNNs. The resulting sparse optimization problem is solved by stochastic proximal gradient algorithm. For further convenience of calculation, an approximation of the threshold-form solution to the proximal operator with pound 1 / 2 is given at the same time. Extensive experi-ments with various datasets and baselines demonstrate the advantages of our new method.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Compact Deep Neural Networks with l1,1 and l1,2 Regularization
    Ma, Rongrong
    Niu, Lingfeng
    2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 1248 - 1254
  • [2] L1/2 regularization
    ZongBen Xu
    Hai Zhang
    Yao Wang
    XiangYu Chang
    Yong Liang
    Science China Information Sciences, 2010, 53 : 1159 - 1169
  • [3] L1/2 regularization
    XU ZongBen 1
    2 Department of Mathematics
    3 University of Science and Technology
    Science China(Information Sciences), 2010, 53 (06) : 1159 - 1169
  • [4] The Group-Lasso: l1,∞ Regularization versus l1,2 Regularization
    Vogt, Julia E.
    Roth, Volker
    PATTERN RECOGNITION, 2010, 6376 : 252 - 261
  • [5] Make l1 regularization effective in training sparse CNN
    He, Juncai
    Jia, Xiaodong
    Xu, Jinchao
    Zhang, Lian
    Zhao, Liang
    COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2020, 77 (01) : 163 - 182
  • [6] ELM with L1/L2 regularization constraints
    Feng B.
    Qin K.
    Jiang Z.
    Hanjie Xuebao/Transactions of the China Welding Institution, 2018, 39 (09): : 31 - 35
  • [7] Stochastic PCA with l2 and l1 Regularization
    Mianjy, Poorya
    Arora, Raman
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [8] αl1 - βl2 regularization for sparse recovery
    Ding, Liang
    Han, Weimin
    INVERSE PROBLEMS, 2019, 35 (12)
  • [9] Oriented total variation l1/2 regularization
    Jiang, Wenfei
    Cui, Hengbin
    Zhang, Fan
    Rong, Yaocheng
    Chen, Zhibo
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2015, 29 : 125 - 137
  • [10] Batch gradient method with smoothing L1/2 regularization for training of feedforward neural networks
    Wu, Wei
    Fan, Qinwei
    Zurada, Jacek M.
    Wang, Jian
    Yang, Dakun
    Liu, Yan
    NEURAL NETWORKS, 2014, 50 : 72 - 78