Sign Based Derivative Filtering for Stochastic Gradient Descent

被引:1
|
作者
Berestizshevsky, Konstantin [1 ]
Even, Guy [1 ]
机构
[1] Tel Aviv Univ, Sch Elect Engn, Tel Aviv, Israel
关键词
Optimization; Gradients; Deep learning; Neural networks;
D O I
10.1007/978-3-030-30484-3_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the performance of stochastic gradient descent (SGD) in deep neural network (DNN) models. We show that during a single training epoch the signs of the partial derivatives of the loss with respect to a single parameter are distributed almost uniformly over the minibatches. We propose an optimization routine, where we maintain a moving average history of the sign of each derivative. This history is used to classify new derivatives as "exploratory" if they disagree with the sign of the history. Conversely, we classify the new derivatives as "exploiting" if they agree with the sign of the history. Each derivative is weighed according to our classification, providing control over exploration and exploitation. The proposed approach leads to training a model with higher accuracy as we demonstrate through a series of experiments.
引用
收藏
页码:208 / 219
页数:12
相关论文
共 50 条
  • [1] Optimal stochastic gradient descent algorithm for filtering
    Turali, M. Yigit
    Koc, Ali T.
    Kozat, Suleyman S.
    [J]. DIGITAL SIGNAL PROCESSING, 2024, 155
  • [2] Matrix Factorization Based Collaborative Filtering with Resilient Stochastic Gradient Descent
    Abdelbar, Ashraf M.
    Elnabarawy, Islam
    Salama, Khalid M.
    Wunsch, Donald C., II
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [3] FASTCF: FPGA-based Accelerator for STochastic-Gradient-Descent-based Collaborative Filtering
    Zhou, Shijie
    Kannan, Rajgopal
    Min, Yu
    Prasanna, Viktor K.
    [J]. PROCEEDINGS OF THE 2018 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'18), 2018, : 259 - 268
  • [4] Reverse Image Filtering Using Total Derivative Approximation and Accelerated Gradient Descent
    Galetto, Fernando J. J.
    Deng, Guang
    [J]. IEEE ACCESS, 2022, 10 : 124928 - 124944
  • [5] Soft-Sign Stochastic Gradient Descent Algorithm for Wireless Federated Learning
    Lee, Seunghoon
    Park, Chanho
    Hong, Songnam
    Eldar, Yonina C.
    Lee, Namyoon
    [J]. SPAWC 2021: 2021 IEEE 22ND INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (IEEE SPAWC 2021), 2020, : 241 - 245
  • [6] Convergence of Momentum-Based Stochastic Gradient Descent
    Jin, Ruinan
    He, Xingkang
    [J]. 2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2020, : 779 - 784
  • [7] Comparison of the Stochastic Gradient Descent Based Optimization Techniques
    Yazan, Ersan
    Talu, M. Fatih
    [J]. 2017 INTERNATIONAL ARTIFICIAL INTELLIGENCE AND DATA PROCESSING SYMPOSIUM (IDAP), 2017,
  • [8] Unforgeability in Stochastic Gradient Descent
    Baluta, Teodora
    Nikolic, Ivica
    Jain, Racchit
    Aggarwal, Divesh
    Saxena, Prateek
    [J]. PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1138 - 1152
  • [9] Preconditioned Stochastic Gradient Descent
    Li, Xi-Lin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (05) : 1454 - 1466
  • [10] Stochastic Reweighted Gradient Descent
    El Hanchi, Ayoub
    Stephens, David A.
    Maddison, Chris J.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,