SparCML: High-Performance Sparse Communication for Machine Learning

被引:61
|
作者
Renggli, Cedric [1 ]
Ashkboos, Saleh [2 ]
Aghagolzadeh, Mehdi [3 ]
Alistarh, Dan [2 ]
Hoefler, Torsten [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] IST Austria, Vienna, Austria
[3] Microsoft, Redmond, WA USA
基金
欧洲研究理事会;
关键词
Sparse AllReduce; Sparse Input Vectors; Sparse AllGather; OPERATIONS; DESCENT;
D O I
10.1145/3295500.3356222
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed "data parallel" distributed across many nodes. Each node's contribution to the overall gradient is summed using a global allreduce. This allreduce is the single communication and thus scalability bottleneck for most machine learning workloads. We observe that frequently, many gradient values are (close to) zero, leading to sparse of sparsifyable communications. To exploit this insight, we analyze, design, and implement a set of communication-efficient protocols for sparse input data, in conjunction with efficient machine learning algorithms which can leverage these primitives. Our communication protocols generalize standard collective operations, by allowing processes to contribute arbitrary sparse input data vectors. Our generic communication library, SPARCML(1), extends MPI to support additional features, such as non-blocking (asynchronous) operations and low-precision data representations. As such, SPARCML and its techniques will form the basis of future highly-scalable machine learning frameworks.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] High-performance computing and machine learning applied in thermal systems analysis
    Mostafa Safdari Shadloo
    Amin Rahmat
    Larry K. B. Li
    Omid Mahian
    Avinash Alagumalai
    Journal of Thermal Analysis and Calorimetry, 2021, 145 : 1733 - 1737
  • [32] Novel, high-performance machine learning model for detection of subclinical keratoconus
    Cao, Ke
    Verspoor, Karin
    Chan, Elsie
    Daniell, Mark
    Sahebjada, Srujana
    Baird, Paul N.
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2021, 62 (08)
  • [33] High-Performance Sparse Fast Fourier Transforms
    Schumacher, Joern
    Pueschel, Markus
    PROCEEDINGS OF THE 2014 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2014), 2014, : 13 - 18
  • [34] FedAT: A High-Performance and Communication -Efficient Federated Learning System with Asynchronous Tiers
    Chai, Zheng
    Chen, Yujing
    Anwar, Ali
    Zhao, Liang
    Cheng, Yue
    Rangwala, Huzefa
    SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
  • [35] Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects
    Awan, Ammar Ahmad
    Jain, Arpan
    Chu, Ching-Hsiang
    Subramoni, Hari
    Panda, Dhableswar K.
    2019 IEEE SYMPOSIUM ON HIGH-PERFORMANCE INTERCONNECTS (HOTI 2019), 2019, : 49 - 53
  • [36] A general framework of high-performance machine learning algorithms: application in structural mechanics
    Markou, George
    Bakas, Nikolaos P.
    Chatzichristofis, Savvas A.
    Papadrakakis, Manolis
    COMPUTATIONAL MECHANICS, 2024, 73 (04) : 705 - 729
  • [37] INTELLIGENT PREDICTION OF THE FROST RESISTANCE OF HIGH-PERFORMANCE CONCRETE: A MACHINE LEARNING METHOD
    Zhang, Jian
    Cao, Yuan
    Xia, Linyu
    Zhang, Desen
    Xu, Wen
    Liu, Yang
    JOURNAL OF CIVIL ENGINEERING AND MANAGEMENT, 2023, 29 (06) : 516 - 529
  • [38] Analysis and modeling of high-performance polymer electrolyte membrane electrolyzers by machine learning
    Gunay, M. Erdem
    Tapan, N. Alper
    Akkoc, Gizem
    INTERNATIONAL JOURNAL OF HYDROGEN ENERGY, 2022, 47 (04) : 2134 - 2151
  • [39] A Machine Learning-Empowered Cache Management Scheme for High-Performance SSDs
    Sun, Hui
    Sun, Chen
    Tong, Haoqiang
    Yue, Yinliang
    Qin, Xiao
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (08) : 2066 - 2080
  • [40] Discovering high-performance broadband and broad angle antireflection surfaces by machine learning
    Haghanifar, Sajad
    McCourt, Michael
    Cheng, Bolong
    Wuenschell, Jeffrey
    Ohodnicki, Paul
    Leu, Paul W.
    OPTICA, 2020, 7 (07): : 784 - 789