MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent

被引：0

作者：

Backstrom, Karl ^{[1
]}

Papatriantafilou, Marina ^{[1
]}

Tsigas, Philippas ^{[1
]}

机构：

[1] Chalmers Univ Technol, Dept Comp Sci & Engn, Gothenburg, Sweden

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2019年

关键词：

D O I：

10.1109/bigdata47090.2019.9006054

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Stochastic Gradient Descent (SGD) is very useful in optimization problems with high-dimensional non-convex target functions, and hence constitutes an important component of several Machine Learning and Data Analytics methods. Recently there have been significant works on understanding the parallelism inherent to SGD, and its convergence properties. Asynchronous, parallel SGD (AsyncPSGD) has received particular attention, due to observed performance benefits. On the other hand, asynchrony implies inherent challenges in understanding the execution of the algorithm and its convergence, stemming from the fact that the contribution of a thread might be based on an old (stale) view of the stale. In this work we aim to deepen the understanding of AsyncPSGD in order to increase the statistical efficiency in the presence of stale gradients. We propose new models for capturing the nature of the staleness distribution in a practical setting. Using the proposed models, we derive a staleness-adaptive SGD framework, MindTheStep-AsyncPSGD, for adapting the step size in an online-fashion, which provably reduces the negative impact of asynchrony. Moreover, we provide general convergence time bounds for a wide class of staleness-adaptive step size strategies for convex target functions. We also provide a detailed empirical study, showing how our approach implies faster convergence for deep learning applications.

引用

页码：16 / 25

页数：10

共 50 条

[21] Optimization of stochastic parallel gradient descent algorithm for adaptive optics in atmospheric turbulence
Chen B.
Li X.
Jiang W.
Zhongguo Jiguang/Chinese Journal of Lasers, 2010, 37 (04): : 959 - 964
[22] Adaptive optical confocal fluorescence microscope with stochastic parallel gradient descent algorithm
He, Yi
Wang, Zhibin
Wei, Ling
Li, Xiqi
Yang, Jinsheng
Zhang, Yudong
2016 ASIA COMMUNICATIONS AND PHOTONICS CONFERENCE (ACP), 2016,
[23] Theoretical Analysis of Stochastic Parallel Gradient Descent Control Algorithm in Adaptive Optics
Yang, Huizhen
Li, Xinyang
PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL II, 2009, : 338 - +
[24] An adaptive enhancement method based on stochastic parallel gradient descent of glioma image
Wang, Hongfei
Peng, Xinhao
Ma, ShiQing
Wang, Shuai
Xu, Chuan
Yang, Ping
IET IMAGE PROCESSING, 2023, 17 (14) : 3976 - 3985
[25] Asynchronous Parallel Fuzzy Stochastic Gradient Descent for High-Dimensional Incomplete Data Representation
Qin, Wen
Luo, Xin
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2024, 32 (02) : 445 - 459
[26] Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee
Zhao, Shen-Yi
Li, Wu-Jun
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2379 - 2385
[27] A(DP)2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent With Differential Privacy
Xu, Jie
Zhang, Wei
Wang, Fei
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (11) : 8036 - 8047
[28] Parallel Asynchronous Stochastic Coordinate Descent with Auxiliary Variables
Yu, Hsiang-Fu
Hsieh, Cho-Jui
Dhillon, Inderjit S.
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[29] The Impact of Synchronization in Parallel Stochastic Gradient Descent
Backstrom, Karl
Papatriantafilou, Marina
Tsigas, Philippas
DISTRIBUTED COMPUTING AND INTELLIGENT TECHNOLOGY, ICDCIT 2022, 2022, 13145 : 60 - 75
[30] Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization
Lian, Xiangru
Huang, Yijun
Li, Yuncheng
Liu, Ji
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28

← 1 2 3 4 5 →