Petuum: A New Platform for Distributed Machine Learning on Big Data

被引:55
|
作者
Xing, Eric P. [1 ]
Ho, Qirong [2 ]
Dai, Wei [1 ]
Kim, Jin Kyu [1 ]
Wei, Jinliang [1 ]
Lee, Seunghak [1 ]
Zheng, Xun [1 ]
Xie, Pengtao [1 ]
Kumar, Abhimanu [1 ]
Yu, Yaoliang [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] ASTAR, Inst Infocomm Res, Singapore, Singapore
基金
美国国家科学基金会;
关键词
Machine Learning; Big Data; Big Model; Distributed Systems; Theory; Data-Parallelism; Model-Parallelism;
D O I
10.1145/2783258.2783323
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How can one build a distributed framework that allows efficient deployment of a wide spectrum of modern advanced machine learning (ML) programs for industrial-scale problems using Big Models (100s of billions of parameters) on Big Data (terabytes or petabytes)? Contemporary parallelization strategies employ fine-grained operations and scheduling beyond the classic bulk-synchronous processing paradigm popularized by MapReduce, or even specialized operators relying on graphical representations of ML programs. The variety of approaches tends to pull systems and algorithms design in different directions, and it remains difficult to find a universal platform applicable to a wide range of different ML programs at scale. We propose a general-purpose framework that systematically addresses data- and model-parallel challenges in large-scale ML, by leveraging several fundamental properties underlying ML programs that make them different from conventional operation-centric programs: error tolerance, dynamic structure, and nonuniform convergence; all stem from the optimization-centric nature shared in ML programs' mathematical definitions, and the iterative convergent behavior of their algorithmic solutions. These properties present unique opportunities for an integrative system design, built on bounded-latency network synchronization and dynamic load-balancing scheduling, which is efficient, programmable, and enjoys provable correctness guarantees. We demonstrate how such a design in light of ML first principles leads to significant performance improvements versus well-known implementations of several ML programs, allowing them to run in much less time and at considerably larger model sizes, on modestly-sized computer clusters.
引用
收藏
页码:1335 / 1344
页数:10
相关论文
共 50 条
  • [21] Application of machine learning algorithms in MBR simulation under big data platform
    Li, Weiwei
    Li, Chunqing
    Wang, Tao
    WATER PRACTICE AND TECHNOLOGY, 2020, 15 (04) : 1238 - 1247
  • [22] Data Platform for Machine Learning
    Agrawal, Pulkit
    Arya, Rajat
    Bindal, Aanchal
    Bhatia, Sandeep
    Gagneja, Anupriya
    Godlewski, Joseph
    Low, Yucheng
    Muss, Timothy
    Paliwal, Mudit Manu
    Raman, Sethu
    Shah, Vishrut
    Shen, Bochao
    Sugden, Laura
    Zhao, Kaiyu
    Wu, Ming-Chuan
    SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1803 - 1816
  • [23] Dynamic Distributed and Parallel Machine Learning algorithms for big data mining processing
    Djafri, Laouni
    DATA TECHNOLOGIES AND APPLICATIONS, 2022, 56 (04) : 558 - 601
  • [24] Alternative Credit Scoring and Classification Employing Machine Learning Techniques on a Big Data Platform
    Hindistan, Yavuz Selim
    Aiyakogu, Burhan Aasin
    Rezaeinazhad, Arash Mohammadian
    Korkmaz, Halil Ergun
    Dag, Hasan
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 731 - 734
  • [25] Distributed Big Data Mining Platform for Smart Grid
    Wang, Zhixiang
    Wu, Bin
    Bai, Demeng
    Qin, Jiafeng
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2345 - 2354
  • [26] A Distributed Computing Platform for fMRI Big Data Analytics
    Makkie, Milad
    Li, Xiang
    Quinn, Shannon
    Lin, Binbin
    Ye, Jieping
    Mon, Geoffrey
    Liu, Tianming
    IEEE TRANSACTIONS ON BIG DATA, 2019, 5 (02) : 109 - 119
  • [27] Machine learning for big data analytics
    Oja, E. (erkki.oja@aalto.fi), 1600, Springer Verlag (384):
  • [28] Big data and machine learning in health
    Carvalho, D.
    Cruz, R.
    EUROPEAN JOURNAL OF PUBLIC HEALTH, 2020, 30 : 10 - 11
  • [29] Machine learning and big scientific data
    Hey, Tony
    Butler, Keith
    Jackson, Sam
    Thiyagalingam, Jeyarajan
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2020, 378 (2166):
  • [30] Machine learning, big data, and neuroscience
    Pillow, Jonathan
    Sahani, Maneesh
    CURRENT OPINION IN NEUROBIOLOGY, 2019, 55 : III - IV