Strategies and Principles of Distributed Machine Learning on Big Data

被引:91
|
作者
Xing, Eric P. [1 ]
Ho, Qirong [1 ]
Xie, Pengtao [1 ]
Wei, Dai [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
关键词
Machine learning; Artificial intelligence big data; Big model; Distributed systems; Principles; Theory; Data-parallelism; Model-parallelism; REGRESSION; MODEL; SELECTION;
D O I
10.1016/J.ENG.2016.02.008
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The rise of big data has led to new demands for machine learning (ML) systems to learn complex models, with millions to billions of parameters, that promise adequate capacity to digest massive datasets and offer powerful predictive analytics (such as high-dimensional latent features, intermediate representations, and decision functions) thereupon. In order to run ML algorithms at such scales, on a distributed cluster with tens to thousands of machines, it is often the case that significant engineering efforts are required-and one might fairly ask whether such engineering truly falls within the domain of ML research. Taking the view that "big" ML systems can benefit greatly from ML-rooted statistical and algorithmic insights-and that ML researchers should therefore not shy away from such systems design-we discuss a series of principles and strategies distilled from our recent efforts on industrial-scale ML solutions. These principles and strategies span a continuum from application, to engineering, and to theoretical research and development of big ML systems and architectures, with the goal of understanding how to make them efficient, generally applicable, and supported with convergence and scaling guarantees. They concern four key questions that traditionally receive little attention in ML research: How can an ML program be distributed over a cluster? How can ML computation be bridged with inter-machine communication? How can such communication be performed? What should be communicated between machines? By exposing underlying statistical and algorithmic characteristics unique to ML programs but not typically seen in traditional computer programs, and by dissecting successful cases to reveal how we have harnessed these principles to design and develop both high-performance distributed ML software as well as general-purpose ML frameworks, we present opportunities for ML researchers and practitioners to further shape and enlarge the area that lies between ML and systems.. (C) 2016 THE AUTHORS. Published by Elsevier LTD on behalf of Chinese Academy of Engineering and Higher Education Press Limited Company. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:179 / 195
页数:17
相关论文
共 50 条
  • [1] Distributed Weighted Extreme Learning Machine for Big Imbalanced Data Learning
    Wang, Zhiqiong
    Xin, Junchang
    Tian, Shuo
    Yu, Ge
    PROCEEDINGS OF ELM-2015, VOL 1: THEORY, ALGORITHMS AND APPLICATIONS (I), 2016, 6 : 319 - 332
  • [2] Distributed and Weighted Extreme Learning Machine for Imbalanced Big Data Learning
    Zhiqiong Wang
    Junchang Xin
    Hongxu Yang
    Shuo Tian
    Ge Yu
    Chenren Xu
    Yudong Yao
    Tsinghua Science and Technology, 2017, 22 (02) : 160 - 173
  • [3] Distributed and Weighted Extreme Learning Machine for Imbalanced Big Data Learning
    Wang, Zhiqiong
    Xin, Junchang
    Yang, Hongxu
    Tian, Shuo
    Yu, Ge
    Xu, Chenren
    Yao, Yudong
    TSINGHUA SCIENCE AND TECHNOLOGY, 2017, 22 (02) : 160 - 173
  • [4] Protecting Machine Learning Integrity in Distributed Big Data Networking
    Wei, Yunkai
    Chen, Yijin
    Xiao, Mingyue
    Maharjan, Sabita
    Zhang, Yan
    IEEE NETWORK, 2020, 34 (04): : 84 - 90
  • [5] Petuum: A New Platform for Distributed Machine Learning on Big Data
    Xing, Eric P.
    Ho, Qirong
    Dai, Wei
    Kim, Jin Kyu
    Wei, Jinliang
    Lee, Seunghak
    Zheng, Xun
    Xie, Pengtao
    Kumar, Abhimanu
    Yu, Yaoliang
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 1335 - 1344
  • [6] A Survey of Distributed and Parallel Extreme Learning Machine for Big Data
    Wang, Zhiqiong
    Sui, Ling
    Xin, Junchang
    Qu, Luxuan
    Yao, Yudong
    IEEE ACCESS, 2020, 8 : 201247 - 201258
  • [7] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
    Hai, Ameen Abdel
    Forouraghi, Babak
    BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219
  • [8] Parallel and Distributed Machine Learning Algorithms for Scalable Big Data Analytics
    Bal, Henri
    Pal, Arindam
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 108 : 1159 - 1161
  • [9] Distributed Machine Learning based Mitigating Straggler in Big Data Environment
    Lu, Haodong
    Wang, Kun
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
  • [10] Comparative Evaluation of Machine Learning Strategies for Analyzing Big Data in Psychiatry
    Cao, Han
    Meyer-Lindenberg, Andreas
    Schwarz, Emanuel
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2018, 19 (11)