Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

被引:0
|
作者
Zhou, Yefan [1 ]
Pang, Tianyu [2 ,3 ]
Liu, Keqin [2 ,3 ]
Martin, Charles H. [4 ]
Mahoney, Michael W. [5 ,6 ]
Yang, Yaoqing [1 ]
机构
[1] Dartmouth Coll, Dept Comp Sci, Hanover, NH 03755 USA
[2] Nanjing Univ, Natl Ctr Appl Math, Nanjing, Peoples R China
[3] Nanjing Univ, Dept Math, Nanjing, Peoples R China
[4] Calculat Consulting, New York, NY USA
[5] Univ Calif Berkeley, LBNL, ICSI, Berkeley, CA USA
[6] Univ Calif Berkeley, Dept Stat, Berkeley, CA USA
关键词
STATISTICAL-MECHANICS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalance is based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of different layers in trained models. We demonstrate the efficacy of using HT-SR-motivated metrics to guide the scheduling and balancing of temperature across all network layers during model training, resulting in improved performance during testing. We implement TempBalance on CIFAR10, CIFAR100, SVHN, and TinyImageNet datasets using ResNets, VGGs and WideResNets with various depths and widths. Our results show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization. We also show that TempBalance outperforms a number of state-of-the-art optimizers and learning rate schedulers.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] Layer-Wise Weight Decay for Deep Neural Networks
    Ishii, Masato
    Sato, Atsushi
    [J]. IMAGE AND VIDEO TECHNOLOGY (PSIVT 2017), 2018, 10749 : 276 - 289
  • [2] Layer-Wise Compressive Training for Convolutional Neural Networks
    Grimaldi, Matteo
    Tenace, Valerio
    Calimera, Andrea
    [J]. FUTURE INTERNET, 2019, 11 (01)
  • [3] A Layer-Wise Ensemble Technique for Binary Neural Network
    Xi, Jiazhen
    Yamauchi, Hiroyuki
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (08)
  • [4] Filtering-based Layer-wise Parameter Update Method for Training a Neural Network
    Ji, Siyu
    Zhai, Kaikai
    Wen, Chenglin
    [J]. 2018 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), 2018, : 389 - 394
  • [5] Layer-wise Pre-training Mechanism Based on Neural Network for Epilepsy Detection
    Lin, Zichao
    Gu, Zhenghui
    Li, Yinghao
    Yu, Zhuliang
    Li, Yuanqing
    [J]. 2020 12TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2020, : 224 - 227
  • [6] Post-training deep neural network pruning via layer-wise calibration
    Lazarevich, Ivan
    Kozlov, Alexander
    Malinin, Nikita
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 798 - 805
  • [7] Layer-Wise Training to Create Efficient Convolutional Neural Networks
    Zeng, Linghua
    Tian, Xinmei
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 : 631 - 641
  • [8] Network with Sub-networks: Layer-wise Detachable Neural Network
    Fuengfusin, Ninnart
    Tamukoh, Hakaru
    [J]. JOURNAL OF ROBOTICS NETWORKING AND ARTIFICIAL LIFE, 2021, 7 (04): : 240 - 244
  • [9] Craft Distillation: Layer-wise Convolutional Neural Network Distillation
    Blakeney, Cody
    Li, Xiaomin
    Yan, Yan
    Zong, Ziliang
    [J]. 2020 7TH IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (CSCLOUD 2020)/2020 6TH IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD (EDGECOM 2020), 2020, : 252 - 257
  • [10] Deep Neural Network Quantization via Layer-Wise Optimization Using Limited Training Data
    Chen, Shangyu
    Wang, Wenya
    Pan, Sinno Jialin
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3329 - 3336