Training Compact DNNs with l1/2 Regularization

被引：2

作者：

Tang, Anda ^{[1
]}

Niu, Lingfeng ^{[2
,3
]}

Miao, Jianyu ^{[4
]}

Zhang, Peng ^{[5
]}

机构：

[1] Univ Chinese Acad Sci, Sch Math Sci, Beijing 100190, Peoples R China

[2] Chinese Acad Sci, Res Ctr Fictitious Econ & Data Sci, Beijing 100190, Peoples R China

[3] Univ Chinese Acad Sci, Sch Econ & Management, Beijing 100190, Peoples R China

[4] Henan Univ Technol, Sch Artificial Intelligence & Big Data, Zhengzhou 450001, Peoples R China

[5] Guangzhou Univ, Cyberspace Inst Adv Technol, Guangzhou 511442, Peoples R China

来源：

PATTERN RECOGNITION | 2023年 / 136卷

基金：

中国国家自然科学基金;

关键词：

Deep neural networks; Model compression; 1; 2; Quasi-norm; Non-Lipschitz regularization; Sparse optimization; L-1/2; REGULARIZATION; VARIABLE SELECTION; NEURAL-NETWORKS; REPRESENTATION; MINIMIZATION; DROPOUT; MODEL;

D O I：

10.1016/j.patcog.2022.109206

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep neural network(DNN) has achieved unprecedented success in many fields. However, its large model parameters which bring a great burden on storage and calculation hinder the development and appli-cation of DNNs. It is worthy of compressing the model to reduce the complexity of the DNN. Sparsity -inducing regularizer is one of the most common tools for compression. In this paper, we propose utilizing the pound 1 / 2 quasi-norm to zero out weights of neural networks and compressing the networks automatically during the learning process. To our knowledge, it is the first work applying the non-Lipschitz contin-uous regularizer for the compression of DNNs. The resulting sparse optimization problem is solved by stochastic proximal gradient algorithm. For further convenience of calculation, an approximation of the threshold-form solution to the proximal operator with pound 1 / 2 is given at the same time. Extensive experi-ments with various datasets and baselines demonstrate the advantages of our new method.(c) 2022 Elsevier Ltd. All rights reserved.

引用

页数：12

共 50 条

[1] Compact Deep Neural Networks with l1,1 and l1,2 Regularization
Ma, Rongrong
Niu, Lingfeng
2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 1248 - 1254
[2] L1/2 regularization
ZongBen Xu
Hai Zhang
Yao Wang
XiangYu Chang
Yong Liang
Science China Information Sciences, 2010, 53 : 1159 - 1169
[3] L1/2 regularization
XU ZongBen 1
2 Department of Mathematics
3 University of Science and Technology
Science China(Information Sciences), 2010, 53 (06) : 1159 - 1169
[4] The Group-Lasso: l1,∞ Regularization versus l1,2 Regularization
Vogt, Julia E.
Roth, Volker
PATTERN RECOGNITION, 2010, 6376 : 252 - 261
[5] Make l1 regularization effective in training sparse CNN
He, Juncai
Jia, Xiaodong
Xu, Jinchao
Zhang, Lian
Zhao, Liang
COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2020, 77 (01) : 163 - 182
[6] ELM with L1/L2 regularization constraints
Feng B.
Qin K.
Jiang Z.
Hanjie Xuebao/Transactions of the China Welding Institution, 2018, 39 (09): : 31 - 35
[7] Stochastic PCA with l2 and l1 Regularization
Mianjy, Poorya
Arora, Raman
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[8] αl1 - βl2 regularization for sparse recovery
Ding, Liang
Han, Weimin
INVERSE PROBLEMS, 2019, 35 (12)
[9] Oriented total variation l1/2 regularization
Jiang, Wenfei
Cui, Hengbin
Zhang, Fan
Rong, Yaocheng
Chen, Zhibo
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2015, 29 : 125 - 137
[10] Batch gradient method with smoothing L1/2 regularization for training of feedforward neural networks
Wu, Wei
Fan, Qinwei
Zurada, Jacek M.
Wang, Jian
Yang, Dakun
Liu, Yan
NEURAL NETWORKS, 2014, 50 : 72 - 78

← 1 2 3 4 5 →