Is normalization indispensable for training deep neural networks?

被引:0
|
作者
Shao, Jie [1 ]
Hu, Kai [2 ]
Wang, Changhu [3 ]
Xue, Xiangyang [1 ]
Raj, Bhiksha [2 ]
机构
[1] Fudan Univ, Shanghai, Peoples R China
[2] Carnegie Mellon Univ, Pittsburgh, PA USA
[3] Byte Dance Lab, Shanghai, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Normalization operations are widely used to train deep neural networks, and they can improve both convergence and generalization in most tasks. The theories for normalization's effectiveness and new forms of normalization have always been hot topics in research. To better understand normalization, one question can be whether normalization is indispensable for training deep neural networks? In this paper, we analyze what would happen when normalization layers are removed from the networks, and show how to train deep neural networks without normalization layers and without performance degradation. Our proposed method can achieve the same or even slightly better performance in a variety of tasks: image classification in ImageNet, object detection and segmentation in MS-COCO, video classification in Kinetics, and machine translation in WMT English-German, etc. Our study may help better understand the role of normalization layers and can be a competitive alternative to normalization layers. Codes are available at https://github.com/hukkai/rescaling.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Centered Weight Normalization in Accelerating Training of Deep Neural Networks
    Huang, Lei
    Liu, Xianglong
    Liu, Yang
    Lang, Bo
    Tao, Dacheng
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2822 - 2830
  • [2] Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
    Salimans, Tim
    Kingma, Diederik P.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [3] Online Normalization for Training Neural Networks
    Chiley, Vitaliy
    Sharapov, Ilya
    Kosson, Atli
    Koster, Urs
    Reece, Ryan
    de la Fuente, Sofia Samaniego
    Subbiah, Vishal
    James, Michael
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] NORMALIZATION EFFECTS ON DEEP NEURAL NETWORKS
    Yu, Jiahui
    Spiliopoulos, Konstantinos
    [J]. FOUNDATIONS OF DATA SCIENCE, 2023, 5 (03): : 389 - 465
  • [5] Batch Normalization and Dropout Regularization in Training Deep Neural Networks with Label Noise
    Rusiecki, Andrzej
    [J]. INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 57 - 66
  • [6] L1-Norm Batch Normalization for Efficient Training of Deep Neural Networks
    Wu, Shuang
    Li, Guoqi
    Deng, Lei
    Liu, Liu
    Wu, Dong
    Xie, Yuan
    Shi, Luping
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (07) : 2043 - 2051
  • [7] GENERALIZABLE MULTI-SITE TRAINING AND TESTING OF DEEP NEURAL NETWORKS USING IMAGE NORMALIZATION
    Onofrey, John A.
    Casetti-Dinescu, Dana I.
    Lauritzen, Andreas D.
    Sarkar, Saradwata
    Venkataraman, Rajesh
    Fan, Richard E.
    Sonn, Geoffrey A.
    Sprenkle, Preston C.
    Staib, Lawrence H.
    Papademetris, Xenophon
    [J]. 2019 IEEE 16TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2019), 2019, : 348 - 351
  • [8] Structure injected weight normalization for training deep networks
    Xu Yuan
    Xiangjun Shen
    Sumet Mehta
    Teng Li
    Shiming Ge
    Zhengjun Zha
    [J]. Multimedia Systems, 2022, 28 : 433 - 444
  • [9] Structure injected weight normalization for training deep networks
    Yuan, Xu
    Shen, Xiangjun
    Mehta, Sumet
    Li, Teng
    Ge, Shiming
    Zha, Zhengjun
    [J]. MULTIMEDIA SYSTEMS, 2022, 28 (02) : 433 - 444
  • [10] MULTILINGUAL TRAINING OF DEEP NEURAL NETWORKS
    Ghoshal, Arnab
    Swietojanski, Pawel
    Renals, Steve
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7319 - 7323