SIGNSGD: Compressed Optimisation for Non-Convex Problems

被引:0
|
作者
Bernstein, Jeremy [1 ,2 ]
Wang, Yu-Xiang [2 ,3 ]
Azizzadenesheli, Kamyar [4 ]
Anandkumar, Anima [1 ,2 ]
机构
[1] CALTECH, Pasadena, CA 91125 USA
[2] Amazon AI, Seattle, WA 98109 USA
[3] UC Santa Barbara, Santa Barbara, CA 93106 USA
[4] UC Irvine, Irvine, CA 92717 USA
关键词
DESCENT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. SIGNSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. The relative l(1)/l(2) geometry of gradients, noise and curvature informs whether SIGNSGD or SGD is theoretically better suited to a particular problem. On the practical side we find that the momentum counterpart of SIGNSGD is able to match the accuracy and convergence speed of ADAM on deep Imagenet models. We extend our theory to the distributed setting, where the parameter server uses majority vote to aggregate gradient signs from each worker enabling 1-bit compression of worker-server communication in both directions. Using a theorem by Gauss (1823) we prove that majority vote can achieve the same reduction in variance as full precision distributed SGD. Thus, there is great promise for sign-based optimisation schemes to achieve fast communication and fast convergence. Code to reproduce experiments is to be found at https://github.com/jxbz/signSGD.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] NON-CONVEX MINIMIZATION PROBLEMS
    EKELAND, I
    BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1979, 1 (03) : 443 - 474
  • [2] Non-convex approach to binary compressed sensing
    Fosson, Sophie M.
    2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2018, : 1959 - 1963
  • [3] A comparison of convex and non-convex compressed sensing applied to multidimensional NMR
    Kazimierczuk, Krzysztof
    Orekhov, Vladislav Yu
    JOURNAL OF MAGNETIC RESONANCE, 2012, 223 : 1 - 10
  • [4] DUALITY FOR A CLASS OF NON-CONVEX PROBLEMS
    GONCALVES, AS
    OPERATIONS RESEARCH, 1975, 23 : B286 - B286
  • [5] Duality for non-convex variational problems
    Bouchitte, Guy
    Fragala, Ilaria
    COMPTES RENDUS MATHEMATIQUE, 2015, 353 (04) : 375 - 379
  • [6] CLASS OF NON-CONVEX OPTIMIZATION PROBLEMS
    HIRCHE, J
    TAN, HK
    ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 1977, 57 (04): : 247 - 253
  • [7] On the perturbation of measurement matrix in non-convex compressed sensing
    Ince, Taner
    Nacaroglu, Arif
    SIGNAL PROCESSING, 2014, 98 : 143 - 149
  • [8] Reconstruction of compressed video via non-convex minimization
    Ji, Chao
    Tian, Jinshou
    Sheng, Liang
    He, Kai
    Xin, Liwei
    Yan, Xin
    Xue, Yanhua
    Zhang, Minrui
    Chen, Ping
    Wang, Xing
    AIP ADVANCES, 2020, 10 (11)
  • [9] Non-convex power plant modelling in energy optimisation
    Makkonen, S
    Lahdelma, R
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 171 (03) : 1113 - 1126
  • [10] An adaptive Extremum Seeking scheme for non-convex optimisation
    Mimmo, N.
    Marconi, L.
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 6755 - 6760