An empirical analysis of the shift and scale parameters in BatchNorm

被引:4
|
作者
Peerthum, Yashna [1 ]
Stamp, Mark [1 ]
机构
[1] San Jose State Univ, Dept Comp Sci, San Jose, CA 95192 USA
关键词
D O I
10.1016/j.ins.2023.118951
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Batch Normalization (BatchNorm) is a technique that improves the training of deep neural networks, especially Convolutional Neural Networks (CNN). It has been empirically demonstrated that BatchNorm increases performance, stability, and accuracy, although the reasons for such improvements are unclear. BatchNorm includes a normalization step as well as trainable shift and scale parameters. In this paper, we empirically examine the relative contribution to the success of BatchNorm of the normalization step, as compared to the re-parameterization via shifting and scaling. To conduct our experiments, we implement two new optimizers in PyTorch, namely, a version of BatchNorm that we refer to as AffineLayer, which includes the re-parameterization step without normalization, and a version with just the normalization step, that we call BatchNorm-minus. We compare the performance of our AffineLayer and BatchNorm-minus implementations to standard BatchNorm, and we also compare these to the case where no batch normalization is used. We experiment with four ResNet architectures (ResNet18, ResNet34, ResNet50, and ResNet101) over a standard image dataset and multiple batch sizes. Among other findings, we provide empirical evidence that the success of BatchNorm may derive primarily from improved weight initialization.
引用
收藏
页数:15
相关论文
共 50 条
  • [11] Optimization of Mean-Shift scale parameters on the EGEE grid
    Li, Ting
    Camarasu-Pop, Sorina
    Glatard, Tristan
    Grenier, Thomas
    Benoit-Cattin, Hugues
    HEALTHGRID APPLICATIONS AND CORE TECHNOLOGIES, 2010, 159 : 203 - 214
  • [12] ANALYSIS OF THE ACCURACY OF EMPIRICAL ESTIMATIONS OF PARAMETERS
    LATYSHEV, VV
    RADIOTEKHNIKA I ELEKTRONIKA, 1984, 29 (07): : 1347 - 1354
  • [14] Frequency shift empirical mode decomposition for extracting low frequency oscillation parameters
    Li, Chengxin
    Liu, Junyong
    Yang, Jiashi
    Yao, Liangzhong
    Bazargan, M.
    Dianli Xitong Zidonghua/Automation of Electric Power Systems, 2011, 35 (20): : 1 - 6
  • [15] CLASSIFICATION OF CORRELATED NORMAL PROCESSES WITH UNKNOWN-PARAMETERS OF SHIFT AND SCALE
    DMITRIENKO, AN
    USACHEV, VA
    RADIOTEKHNIKA I ELEKTRONIKA, 1995, 40 (02): : 260 - 265
  • [17] INVESTIGATION OF UNIQUENESS OF PARAMETERS IN PHASE SHIFT ANALYSIS METHOD
    GONCHAR, VY
    SOVIET JOURNAL OF NUCLEAR PHYSICS-USSR, 1969, 9 (05): : 578 - &
  • [18] The analysis of parameters of frequency modulation with minimal shift keying
    Mikushin, Alexander V.
    Shingarev, Aleksey M.
    EDM 2007: 8TH INTERNATIONAL WORKSHOP AND TUTORIALS ON ELECTRON DEVICES AND MATERIALS, 2007, : 203 - 205
  • [19] Scale Shift
    Grandstaff-Rice, Emily
    ARCHITECT, 2023, 112 (01): : 54 - 54
  • [20] Empirical Bayesian analysis of the Poisson intervention and incidence parameters
    Bartolucci, AA
    Singh, KP
    Shanmugam, R
    MATHEMATICS AND COMPUTERS IN SIMULATION, 2004, 64 (3-4) : 393 - 399