SIGNSGD: Compressed Optimisation for Non-Convex Problems

被引:0
|
作者
Bernstein, Jeremy [1 ,2 ]
Wang, Yu-Xiang [2 ,3 ]
Azizzadenesheli, Kamyar [4 ]
Anandkumar, Anima [1 ,2 ]
机构
[1] CALTECH, Pasadena, CA 91125 USA
[2] Amazon AI, Seattle, WA 98109 USA
[3] UC Santa Barbara, Santa Barbara, CA 93106 USA
[4] UC Irvine, Irvine, CA 92717 USA
关键词
DESCENT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. SIGNSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. The relative l(1)/l(2) geometry of gradients, noise and curvature informs whether SIGNSGD or SGD is theoretically better suited to a particular problem. On the practical side we find that the momentum counterpart of SIGNSGD is able to match the accuracy and convergence speed of ADAM on deep Imagenet models. We extend our theory to the distributed setting, where the parameter server uses majority vote to aggregate gradient signs from each worker enabling 1-bit compression of worker-server communication in both directions. Using a theorem by Gauss (1823) we prove that majority vote can achieve the same reduction in variance as full precision distributed SGD. Thus, there is great promise for sign-based optimisation schemes to achieve fast communication and fast convergence. Code to reproduce experiments is to be found at https://github.com/jxbz/signSGD.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Nesting of non-convex figures in non-convex contours
    Vinade, C.
    Dias, A.
    Informacion Tecnologica, 2000, 11 (01): : 149 - 156
  • [42] What's Best for My Mesh? Convex or Non-Convex Regularisation for Mesh Optimisation
    Pilbrough, Jason
    Amayo, Paul
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 6617 - 6624
  • [43] Exploring the convex transformations for solving non-convex bilinear integer problems
    Harjunkoski, I
    Pörn, R
    Westerlund, T
    COMPUTERS & CHEMICAL ENGINEERING, 1999, 23 : S471 - S474
  • [44] Signal recovery adapted to a dictionary from non-convex compressed sensing
    Huang, Jianwen
    Zhang, Feng
    Liu, Xinling
    Jia, Jinping
    Wang, Runke
    INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2023, 18 (03) : 224 - 234
  • [45] Non-convex block-sparse compressed sensing with redundant dictionaries
    Liu, Chunyan
    Wang, Jianjun
    Wang, Wendong
    Wang, Zhi
    IET SIGNAL PROCESSING, 2017, 11 (02) : 171 - 180
  • [46] ITERATIVE l1 MINIMIZATION FOR NON-CONVEX COMPRESSED SENSING
    Yin, Penghang
    Xin, Jack
    JOURNAL OF COMPUTATIONAL MATHEMATICS, 2017, 35 (04) : 439 - 451
  • [47] A Duality Theory for Non-convex Problems in the Calculus of Variations
    Guy Bouchitté
    Ilaria Fragalà
    Archive for Rational Mechanics and Analysis, 2018, 229 : 361 - 415
  • [48] HONOR: Hybrid Optimization for NOn-convex Regularized problems
    Gong, Pinghua
    Ye, Jieping
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [49] Comparative study of non-convex penalties and related algorithms in compressed sensing
    Xu, Fanding
    Duan, Junbo
    Liu, Wenyu
    DIGITAL SIGNAL PROCESSING, 2023, 135
  • [50] STOCHASTIC PROBLEMS OF OPTIMAL CONTROL WITH NON-CONVEX LIMITATION
    ABASHEV, FK
    KATS, IY
    PRIKLADNAYA MATEMATIKA I MEKHANIKA, 1974, 38 (03): : 409 - 416