Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

被引：0

作者：

Zhou, Yefan ^{[1
]}

Pang, Tianyu ^{[2
,3
]}

Liu, Keqin ^{[2
,3
]}

Martin, Charles H. ^{[4
]}

Mahoney, Michael W. ^{[5
,6
]}

Yang, Yaoqing ^{[1
]}

机构：

[1] Dartmouth Coll, Dept Comp Sci, Hanover, NH 03755 USA

[2] Nanjing Univ, Natl Ctr Appl Math, Nanjing, Peoples R China

[3] Nanjing Univ, Dept Math, Nanjing, Peoples R China

[4] Calculat Consulting, New York, NY USA

[5] Univ Calif Berkeley, LBNL, ICSI, Berkeley, CA USA

[6] Univ Calif Berkeley, Dept Stat, Berkeley, CA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

STATISTICAL-MECHANICS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalance is based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of different layers in trained models. We demonstrate the efficacy of using HT-SR-motivated metrics to guide the scheduling and balancing of temperature across all network layers during model training, resulting in improved performance during testing. We implement TempBalance on CIFAR10, CIFAR100, SVHN, and TinyImageNet datasets using ResNets, VGGs and WideResNets with various depths and widths. Our results show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization. We also show that TempBalance outperforms a number of state-of-the-art optimizers and learning rate schedulers.

引用

页数：31

共 50 条

[1] Layer-Wise Weight Decay for Deep Neural Networks
Ishii, Masato
Sato, Atsushi
[J]. IMAGE AND VIDEO TECHNOLOGY (PSIVT 2017), 2018, 10749 : 276 - 289
[2] Layer-Wise Compressive Training for Convolutional Neural Networks
Grimaldi, Matteo
Tenace, Valerio
Calimera, Andrea
[J]. FUTURE INTERNET, 2019, 11 (01)
[3] A Layer-Wise Ensemble Technique for Binary Neural Network
Xi, Jiazhen
Yamauchi, Hiroyuki
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (08)
[4] Filtering-based Layer-wise Parameter Update Method for Training a Neural Network
Ji, Siyu
Zhai, Kaikai
Wen, Chenglin
[J]. 2018 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), 2018, : 389 - 394
[5] Layer-wise Pre-training Mechanism Based on Neural Network for Epilepsy Detection
Lin, Zichao
Gu, Zhenghui
Li, Yinghao
Yu, Zhuliang
Li, Yuanqing
[J]. 2020 12TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2020, : 224 - 227
[6] Post-training deep neural network pruning via layer-wise calibration
Lazarevich, Ivan
Kozlov, Alexander
Malinin, Nikita
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 798 - 805
[7] Layer-Wise Training to Create Efficient Convolutional Neural Networks
Zeng, Linghua
Tian, Xinmei
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 : 631 - 641
[8] Network with Sub-networks: Layer-wise Detachable Neural Network
Fuengfusin, Ninnart
Tamukoh, Hakaru
[J]. JOURNAL OF ROBOTICS NETWORKING AND ARTIFICIAL LIFE, 2021, 7 (04): : 240 - 244
[9] Craft Distillation: Layer-wise Convolutional Neural Network Distillation
Blakeney, Cody
Li, Xiaomin
Yan, Yan
Zong, Ziliang
[J]. 2020 7TH IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (CSCLOUD 2020)/2020 6TH IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD (EDGECOM 2020), 2020, : 252 - 257
[10] Deep Neural Network Quantization via Layer-Wise Optimization Using Limited Training Data
Chen, Shangyu
Wang, Wenya
Pan, Sinno Jialin
[J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3329 - 3336

← 1 2 3 4 5 →