MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent

被引:0
|
作者
Backstrom, Karl [1 ]
Papatriantafilou, Marina [1 ]
Tsigas, Philippas [1 ]
机构
[1] Chalmers Univ Technol, Dept Comp Sci & Engn, Gothenburg, Sweden
关键词
D O I
10.1109/bigdata47090.2019.9006054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic Gradient Descent (SGD) is very useful in optimization problems with high-dimensional non-convex target functions, and hence constitutes an important component of several Machine Learning and Data Analytics methods. Recently there have been significant works on understanding the parallelism inherent to SGD, and its convergence properties. Asynchronous, parallel SGD (AsyncPSGD) has received particular attention, due to observed performance benefits. On the other hand, asynchrony implies inherent challenges in understanding the execution of the algorithm and its convergence, stemming from the fact that the contribution of a thread might be based on an old (stale) view of the stale. In this work we aim to deepen the understanding of AsyncPSGD in order to increase the statistical efficiency in the presence of stale gradients. We propose new models for capturing the nature of the staleness distribution in a practical setting. Using the proposed models, we derive a staleness-adaptive SGD framework, MindTheStep-AsyncPSGD, for adapting the step size in an online-fashion, which provably reduces the negative impact of asynchrony. Moreover, we provide general convergence time bounds for a wide class of staleness-adaptive step size strategies for convex target functions. We also provide a detailed empirical study, showing how our approach implies faster convergence for deep learning applications.
引用
收藏
页码:16 / 25
页数:10
相关论文
共 50 条
  • [21] Optimization of stochastic parallel gradient descent algorithm for adaptive optics in atmospheric turbulence
    Chen B.
    Li X.
    Jiang W.
    Zhongguo Jiguang/Chinese Journal of Lasers, 2010, 37 (04): : 959 - 964
  • [22] Adaptive optical confocal fluorescence microscope with stochastic parallel gradient descent algorithm
    He, Yi
    Wang, Zhibin
    Wei, Ling
    Li, Xiqi
    Yang, Jinsheng
    Zhang, Yudong
    2016 ASIA COMMUNICATIONS AND PHOTONICS CONFERENCE (ACP), 2016,
  • [23] Theoretical Analysis of Stochastic Parallel Gradient Descent Control Algorithm in Adaptive Optics
    Yang, Huizhen
    Li, Xinyang
    PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL II, 2009, : 338 - +
  • [24] An adaptive enhancement method based on stochastic parallel gradient descent of glioma image
    Wang, Hongfei
    Peng, Xinhao
    Ma, ShiQing
    Wang, Shuai
    Xu, Chuan
    Yang, Ping
    IET IMAGE PROCESSING, 2023, 17 (14) : 3976 - 3985
  • [25] Asynchronous Parallel Fuzzy Stochastic Gradient Descent for High-Dimensional Incomplete Data Representation
    Qin, Wen
    Luo, Xin
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2024, 32 (02) : 445 - 459
  • [26] Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee
    Zhao, Shen-Yi
    Li, Wu-Jun
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2379 - 2385
  • [27] A(DP)2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent With Differential Privacy
    Xu, Jie
    Zhang, Wei
    Wang, Fei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (11) : 8036 - 8047
  • [28] Parallel Asynchronous Stochastic Coordinate Descent with Auxiliary Variables
    Yu, Hsiang-Fu
    Hsieh, Cho-Jui
    Dhillon, Inderjit S.
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [29] The Impact of Synchronization in Parallel Stochastic Gradient Descent
    Backstrom, Karl
    Papatriantafilou, Marina
    Tsigas, Philippas
    DISTRIBUTED COMPUTING AND INTELLIGENT TECHNOLOGY, ICDCIT 2022, 2022, 13145 : 60 - 75
  • [30] Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization
    Lian, Xiangru
    Huang, Yijun
    Li, Yuncheng
    Liu, Ji
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28