Is normalization indispensable for training deep neural networks?

被引：0

作者：

Shao, Jie ^{[1
]}

Hu, Kai ^{[2
]}

Wang, Changhu ^{[3
]}

Xue, Xiangyang ^{[1
]}

Raj, Bhiksha ^{[2
]}

机构：

[1] Fudan Univ, Shanghai, Peoples R China

[2] Carnegie Mellon Univ, Pittsburgh, PA USA

[3] Byte Dance Lab, Shanghai, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Normalization operations are widely used to train deep neural networks, and they can improve both convergence and generalization in most tasks. The theories for normalization's effectiveness and new forms of normalization have always been hot topics in research. To better understand normalization, one question can be whether normalization is indispensable for training deep neural networks? In this paper, we analyze what would happen when normalization layers are removed from the networks, and show how to train deep neural networks without normalization layers and without performance degradation. Our proposed method can achieve the same or even slightly better performance in a variety of tasks: image classification in ImageNet, object detection and segmentation in MS-COCO, video classification in Kinetics, and machine translation in WMT English-German, etc. Our study may help better understand the role of normalization layers and can be a competitive alternative to normalization layers. Codes are available at https://github.com/hukkai/rescaling.

引用

页数：11

共 50 条

[1] Centered Weight Normalization in Accelerating Training of Deep Neural Networks
Huang, Lei
Liu, Xianglong
Liu, Yang
Lang, Bo
Tao, Dacheng
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2822 - 2830
[2] Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
Salimans, Tim
Kingma, Diederik P.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[3] Online Normalization for Training Neural Networks
Chiley, Vitaliy
Sharapov, Ilya
Kosson, Atli
Koster, Urs
Reece, Ryan
de la Fuente, Sofia Samaniego
Subbiah, Vishal
James, Michael
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[4] NORMALIZATION EFFECTS ON DEEP NEURAL NETWORKS
Yu, Jiahui
Spiliopoulos, Konstantinos
[J]. FOUNDATIONS OF DATA SCIENCE, 2023, 5 (03): : 389 - 465
[5] Batch Normalization and Dropout Regularization in Training Deep Neural Networks with Label Noise
Rusiecki, Andrzej
[J]. INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 57 - 66
[6] L1-Norm Batch Normalization for Efficient Training of Deep Neural Networks
Wu, Shuang
Li, Guoqi
Deng, Lei
Liu, Liu
Wu, Dong
Xie, Yuan
Shi, Luping
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (07) : 2043 - 2051
[7] GENERALIZABLE MULTI-SITE TRAINING AND TESTING OF DEEP NEURAL NETWORKS USING IMAGE NORMALIZATION
Onofrey, John A.
Casetti-Dinescu, Dana I.
Lauritzen, Andreas D.
Sarkar, Saradwata
Venkataraman, Rajesh
Fan, Richard E.
Sonn, Geoffrey A.
Sprenkle, Preston C.
Staib, Lawrence H.
Papademetris, Xenophon
[J]. 2019 IEEE 16TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2019), 2019, : 348 - 351
[8] Structure injected weight normalization for training deep networks
Xu Yuan
Xiangjun Shen
Sumet Mehta
Teng Li
Shiming Ge
Zhengjun Zha
[J]. Multimedia Systems, 2022, 28 : 433 - 444
[9] Structure injected weight normalization for training deep networks
Yuan, Xu
Shen, Xiangjun
Mehta, Sumet
Li, Teng
Ge, Shiming
Zha, Zhengjun
[J]. MULTIMEDIA SYSTEMS, 2022, 28 (02) : 433 - 444
[10] MULTILINGUAL TRAINING OF DEEP NEURAL NETWORKS
Ghoshal, Arnab
Swietojanski, Pawel
Renals, Steve
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7319 - 7323

← 1 2 3 4 5 →