Survey on Parallel and Distributed Optimization Algorithms for Scalable Machine Learning

被引:0
|
作者
Kang L.-Y. [1 ,2 ]
Wang J.-F. [1 ,2 ]
Liu J. [1 ,3 ]
Ye D. [1 ]
机构
[1] Technology Center of Software Engineering, Institute of Software, The Chinese Academy of Sciences, Beijing
[2] University of Chinese Academy of Sciences, Beijing
[3] State Key Laboratory of Computer Science, Institute of Software, The Chinese Academy of Sciences, Beijing
来源
Ruan Jian Xue Bao/Journal of Software | 2018年 / 29卷 / 01期
基金
中国国家自然科学基金;
关键词
Distributed algorithm; Machine learning; Optimization algorithm; Parallel algorithm;
D O I
10.13328/j.cnki.jos.005376
中图分类号
学科分类号
摘要
Machine learning problems can be viewed as optimization-centric programs, and the optimization algorithm is an important tool to solve the objective function. In the era of big data, in order to speed up the training process, it is essential to design parallel and distributed optimization algorithms by multi-core computing and distributed computing technologies. In recent years, there are a lot of research works in this field, and some algorithms have been widely applied on machine learning platforms. In this paper, five common optimization algorithms, including gradient descent algorithm, second order optimization algorithm, proximal gradient algorithm, coordinate descent algorithm and alternating direction method of multiplier, are studied. Each type of algorithm is analyzed from the view of parallel and distributed respectively, and algorithms of the same type are compared by their model type, input data characteristic, algorithm evaluation and parallel communication mode. In addition, the implementations and applications of the optimization algorithm on representative scalable machine learning platforms are analyzed. Meanwhile, all the optimization algorithms introduced in this paper are categorized by a hierarchical classification diagram, which can be used as a tool to select the appropriate optimization algorithm according to the objective function type, and also to cross explore how to apply optimization algorithms to the new objective function type. Finally, the problems of the existing optimization algorithms are discussed, and the possible solutions and the future research directions are proposed. © Copyright 2018, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
下载
收藏
页码:109 / 130
页数:21
相关论文
共 71 条
  • [1] Leon B., Frank E.C., Jorge N., Optimization methods for large-scale machine learning, (2016)
  • [2] Neal P., Stephen B., Proximal algorithms, Foundations and Trends in Optimization, 1, 3, pp. 127-239, (2014)
  • [3] Stephen J.W., Coordinate descent algorithms, (2015)
  • [4] Stephen B., Neal P., Eric C., Borja P., Jonathan E., Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends in Machine Learning, 3, 1, pp. 1-122, (2011)
  • [5] Eric X., Qirong H., Xie P.T., Wei D., Strategies and principles of distributed machine learning on big data, Engineering, 2, 2, pp. 179-195, (2016)
  • [6] Frederic L., Frederic G., David B., Bulk synchronous parallel ML: Modular implementation and performance prediction, Proc. of the Int'l Conf. on Computational Science, pp. 1046-1054, (2005)
  • [7] Nesterov Y., Introductory Lectures on Convex Optimization: A Basic Course, (2004)
  • [8] Meng X.R., Joseph B., Burak Y., Evan S., Shivaram V., Davies L., Jeremy F., Db T., Manish M., Sean O., Doris X., Reynold X., Michael J.F., Reza Z., Matei Z., Ameet T., MLlib: Machine learning in apache spark, The Journal of Machine Learning Research, 17, 1, pp. 1235-1241, (2016)
  • [9] Martin A.Z., Markus W., Alexander S., Li L.H., Parallelized stochastic gradient descent, Advances in Neural Information Processing Systems, pp. 2595-2603, (2010)
  • [10] Leen T.K., Orr G.B., Optimal stochastic search and adaptive momentum, Advances in Neural Information Processing Systems, pp. 477-484, (1994)