FTSGD: An Adaptive Stochastic Gradient Descent Algorithm for Spark MLlib

被引：2

作者：

Zhang, Hong ^{[1
]}

Liu, Zixia ^{[1
]}

Huang, Hai ^{[2
]}

Wang, Liqiang ^{[1
]}

机构：

[1] Univ Cent Florida, Dept Comp Sci, Orlando, FL 32816 USA

[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA

来源：

2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH) | 2018年

关键词：

Spark; MLlib; Asynchronous Stochastic Gradient Decent; Adaptive Iterative Learning;

D O I：

10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00-22

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The proliferation of massive datasets and the surge of interests in big data analytics have popularized a number of novel distributed data processing platforms such as Hadoop and Spark. Their large and growing ecosystems of libraries enable even novice to take advantage of the latest data analytics and machine learning algorithms. However, time-consuming data synchronization and communications in iterative algorithms on large-scale distributed platforms can lead to significant performance inefficiency. MLlib is Spark's scalable library consisting of common machine learning algorithms, many of which employ Stochastic Gradient Descent (SGD) to find minima or maxima by iterations. However, the convergence can be very slow if gradient data are synchronized on each iteration. In this work, we optimize the current implementation of SGD in Spark's MLlib by reusing data partition for multiple times within a single iteration to find better candidate weights in a more efficient way. Whether using multiple local iterations within each partition is dynamically decided by the 68-95-99.7 rule. We also design a variant of momentum algorithm to optimize step size in every iteration. This method uses a new adaptive rule that decreases the step size whenever neighboring gradients show differing directions of significance. Experiments show that our adaptive algorithm is more efficient and can be 7 times faster compared to the original MLlib's SGD.

引用

页码：828 / 835

页数：8

共 50 条

[41] Wavefront error correction with stochastic parallel gradient descent algorithm
Liu Jiaguo
Li Lin
Hu Xinqi
Yu Xin
Zhao Lei
OPTICAL DESIGN AND TESTING III, PTS 1 AND 2, 2008, 6834
[42] Test of the stochastic parallel gradient descent algorithm in laboratory experiments
Banakh V.A.
Larichev A.V.
Razenkov I.A.
Shesternin A.N.
Atmospheric and Oceanic Optics, 2013, 26 (4) : 337 - 344
[43] A stochastic natural gradient descent algorithm for blind signal separation
Yang, HH
Amari, S
NEURAL NETWORKS FOR SIGNAL PROCESSING VI, 1996, : 433 - 442
[44] Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm
Needell, Deanna
Srebro, Nathan
Ward, Rachel
MATHEMATICAL PROGRAMMING, 2016, 155 (1-2) : 549 - 573
[45] A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces
Le Lan, Charline
Greaves, Joshua
Farebrother, Jesse
Rowland, Mark
Pedregosa, Fabian
Agarwal, Rishabh
Bellemare, Marc
arXiv, 2022,
[46] Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm
Needell, Deanna
Srebro, Nathan
Ward, Rachel
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[47] Performance of stochastic parallel gradient descent algorithm in coherent combination
Li X.
He Y.
1600, Chinese Optical Society (36):
[48] A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces
Le Lan, Charline
Greaves, Joshua
Farebrother, Jesse
Rowland, Mark
Pedregosa, Fabian
Agarwal, Rishabh
Bellemare, Marc
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[49] A normalized gradient descent algorithm for nonlinear adaptive filters using a gradient adaptive step size
Mandic, DP
Hanna, AI
Razaz, M
IEEE SIGNAL PROCESSING LETTERS, 2001, 8 (11) : 295 - 297
[50] Parallel perturbation gradient descent algorithm for adaptive wavefront correction
Carhart, GW
Ricklin, JC
Sivokon, VP
Vorontsov, MA
ADAPTIVE OPTICS AND APPLICATIONS, 1997, 3126 : 221 - 227

← 1 2 3 4 5 →