Accelerate Distributed Stochastic Descent for Nonconvex Optimization with Momentum

被引:1
|
作者
Cong, Guojing [1 ]
Liu, Tianyi [2 ]
机构
[1] IBM TJ Watson Res Ctr, Ossining, NY 10562 USA
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
关键词
D O I
10.1109/MLHPCAI4S51975.2020.00011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Momentum method has been used extensively in optimizers for deep learning. Recent studies show that distributed training through K-step averaging has many nice properties. We propose a momentum method for such model averaging approaches. At each individual learner level traditional stochastic gradient is applied. At the meta-level (global learner level), one momentum term is applied and we call it block momentum. We analyze the convergence and scaling properties of such momentum methods. Our experimental results show that block momentum not only accelerates training, but also achieves better results.
引用
收藏
页码:29 / 39
页数:11
相关论文
共 50 条