Sign Based Derivative Filtering for Stochastic Gradient Descent

被引：1

作者：

Berestizshevsky, Konstantin ^{[1
]}

Even, Guy ^{[1
]}

机构：

[1] Tel Aviv Univ, Sch Elect Engn, Tel Aviv, Israel

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: DEEP LEARNING, PT II | 2019年 / 11728卷

关键词：

Optimization; Gradients; Deep learning; Neural networks;

D O I：

10.1007/978-3-030-30484-3_18

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the performance of stochastic gradient descent (SGD) in deep neural network (DNN) models. We show that during a single training epoch the signs of the partial derivatives of the loss with respect to a single parameter are distributed almost uniformly over the minibatches. We propose an optimization routine, where we maintain a moving average history of the sign of each derivative. This history is used to classify new derivatives as "exploratory" if they disagree with the sign of the history. Conversely, we classify the new derivatives as "exploiting" if they agree with the sign of the history. Each derivative is weighed according to our classification, providing control over exploration and exploitation. The proposed approach leads to training a model with higher accuracy as we demonstrate through a series of experiments.

引用

页码：208 / 219

页数：12

共 50 条

[1] Optimal stochastic gradient descent algorithm for filtering
Turali, M. Yigit
Koc, Ali T.
Kozat, Suleyman S.
[J]. DIGITAL SIGNAL PROCESSING, 2024, 155
[2] Matrix Factorization Based Collaborative Filtering with Resilient Stochastic Gradient Descent
Abdelbar, Ashraf M.
Elnabarawy, Islam
Salama, Khalid M.
Wunsch, Donald C., II
[J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[3] FASTCF: FPGA-based Accelerator for STochastic-Gradient-Descent-based Collaborative Filtering
Zhou, Shijie
Kannan, Rajgopal
Min, Yu
Prasanna, Viktor K.
[J]. PROCEEDINGS OF THE 2018 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'18), 2018, : 259 - 268
[4] Reverse Image Filtering Using Total Derivative Approximation and Accelerated Gradient Descent
Galetto, Fernando J. J.
Deng, Guang
[J]. IEEE ACCESS, 2022, 10 : 124928 - 124944
[5] Soft-Sign Stochastic Gradient Descent Algorithm for Wireless Federated Learning
Lee, Seunghoon
Park, Chanho
Hong, Songnam
Eldar, Yonina C.
Lee, Namyoon
[J]. SPAWC 2021: 2021 IEEE 22ND INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (IEEE SPAWC 2021), 2020, : 241 - 245
[6] Convergence of Momentum-Based Stochastic Gradient Descent
Jin, Ruinan
He, Xingkang
[J]. 2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2020, : 779 - 784
[7] Comparison of the Stochastic Gradient Descent Based Optimization Techniques
Yazan, Ersan
Talu, M. Fatih
[J]. 2017 INTERNATIONAL ARTIFICIAL INTELLIGENCE AND DATA PROCESSING SYMPOSIUM (IDAP), 2017,
[8] Unforgeability in Stochastic Gradient Descent
Baluta, Teodora
Nikolic, Ivica
Jain, Racchit
Aggarwal, Divesh
Saxena, Prateek
[J]. PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1138 - 1152
[9] Preconditioned Stochastic Gradient Descent
Li, Xi-Lin
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (05) : 1454 - 1466
[10] Stochastic Reweighted Gradient Descent
El Hanchi, Ayoub
Stephens, David A.
Maddison, Chris J.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,

← 1 2 3 4 5 →