SparCML: High-Performance Sparse Communication for Machine Learning

被引:61
|
作者
Renggli, Cedric [1 ]
Ashkboos, Saleh [2 ]
Aghagolzadeh, Mehdi [3 ]
Alistarh, Dan [2 ]
Hoefler, Torsten [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] IST Austria, Vienna, Austria
[3] Microsoft, Redmond, WA USA
基金
欧洲研究理事会;
关键词
Sparse AllReduce; Sparse Input Vectors; Sparse AllGather; OPERATIONS; DESCENT;
D O I
10.1145/3295500.3356222
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed "data parallel" distributed across many nodes. Each node's contribution to the overall gradient is summed using a global allreduce. This allreduce is the single communication and thus scalability bottleneck for most machine learning workloads. We observe that frequently, many gradient values are (close to) zero, leading to sparse of sparsifyable communications. To exploit this insight, we analyze, design, and implement a set of communication-efficient protocols for sparse input data, in conjunction with efficient machine learning algorithms which can leverage these primitives. Our communication protocols generalize standard collective operations, by allowing processes to contribute arbitrary sparse input data vectors. Our generic communication library, SPARCML(1), extends MPI to support additional features, such as non-blocking (asynchronous) operations and low-precision data representations. As such, SPARCML and its techniques will form the basis of future highly-scalable machine learning frameworks.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Accurate machine-learning predictions of coercivity in high-performance permanent magnets
    Bhandari, Churna
    Nop, Gavin N.
    Smith, Jonathan D. H.
    Paudyal, Durga
    PHYSICAL REVIEW APPLIED, 2024, 22 (02):
  • [42] Machine Learning-Based Grading of Engine Health for High-Performance Vehicles
    Amalyan, Edgar
    Latifi, Shahram
    ELECTRONICS, 2025, 14 (03):
  • [43] Facilitating Collaboration in Machine Learning and High-Performance Computing Projects with an Interaction Room
    Book, Matthias
    Riedel, Morris
    Neukirchen, Helmut
    Erlingsson, Ernir
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON E-SCIENCE (ESCIENCE 2022), 2022, : 529 - 538
  • [44] An explanatory machine learning model for forecasting compressive strength of high-performance concrete
    Yan, Guifeng
    Wu, Xu
    Zhang, Wei
    Bao, Yuping
    MULTISCALE AND MULTIDISCIPLINARY MODELING EXPERIMENTS AND DESIGN, 2024, 7 (01) : 543 - 555
  • [45] Machine learning-assisted design of high-performance perovskite photodetectors: a review
    Li, Xiaohui
    Mai, Yongxiang
    Lan, Chunfeng
    Yang, Fu
    Zhang, Putao
    Li, Shengjun
    ADVANCED COMPOSITES AND HYBRID MATERIALS, 2025, 8 (01)
  • [46] Discovery of high-performance dielectric materials with machine-learning-guided search
    Riebesell, Janosh
    Surta, Todd Wesley
    Goodall, Rhys Edward Andrew
    Gaultois, Michael William
    Lee, Alpha Albert
    CELL REPORTS PHYSICAL SCIENCE, 2024, 5 (10):
  • [47] Applications of Machine Learning and High-Performance Computing in the Era of COVID-19
    Majeed, Abdul
    Lee, Sungchang
    APPLIED SYSTEM INNOVATION, 2021, 4 (03)
  • [48] Optimization of high-performance concrete mix ratio design using machine learning
    Chen, Bin
    Wang, Lei
    Feng, Zongbao
    Liu, Yang
    Wu, Xianguo
    Qin, Yawei
    Xia, Lingyu
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [49] Utilizing Hybrid Machine Learning To Estimate The Compressive Strength Of High-Performance Concrete
    Guo, Lili
    Fan, Daming
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2024, 27 (11): : 3439 - 3452
  • [50] Optimized machine learning models for predicting the tensile strength of high-performance concrete
    Kumar, Divesh Ranjan
    Kumar, Pramod
    Thangavel, Pradeep
    Wipulanusat, Warit
    Thongchom, Chanachai
    Samui, Pijush
    JOURNAL OF STRUCTURAL INTEGRITY AND MAINTENANCE, 2025, 10 (01)