In-network Allreduce with Multiple Spanning Trees on PolarFly

被引:1
|
作者
Lakhotia, Kartik [1 ]
Isham, Kelly [2 ]
Monroe, Laura [3 ]
Besta, Maciej [4 ]
Hoefler, Torsten [4 ]
Petrini, Fabrizio [1 ]
机构
[1] Intel Labs, Santa Clara, CA 95054 USA
[2] Colgate Univ, Hamilton, NY 13346 USA
[3] Los Alamos Natl Lab, Los Alamos, NM USA
[4] Swiss Fed Inst Technol, Zurich, Switzerland
关键词
In-network Collectives; Allreduce; PolarFly; Erdos-Renyi Graphs; Spanning Trees; Hamiltonian Paths; GRAPHS;
D O I
10.1145/3558481.3591073
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Allreduce is a fundamental collective used in parallel computing and distributed training of machine learning models, and can become a performance bottleneck on large systems. In-network computing improves Allreduce performance by reducing packets on the fly using network routers. However, the throughput of current in-network solutions is limited to a single link bandwidth. We develop, compare and contrast two different sets of Allreduce spanning trees embedded into PolarFly, a high-performance diameter-2 network topology. Both of our solutions offer theoretically guaranteed near-optimal performance, boosting Allreduce bandwidth by a factor equal to half the network radix of nodes. While our first set offers low-latency with trees of depth-3, the second set offers congestion-free implementation which reduces complexity and resource requirements of in-network computing units. In doing so, we also distinguish PolarFly as a highly suitable network for distributed deep learning and other applications that employ throughput-bound large Allreductions.
引用
收藏
页码:165 / 176
页数:12
相关论文
共 50 条
  • [1] In-Network AllReduce Optimization with Virtual Aggregation Trees
    Song, Haoyu
    [J]. PROCEEDINGS OF THE 2024 SIGCOMM WORKSHOP ON NETWORKS FOR AI COMPUTING, NAIC 2024, 2024, : 54 - 60
  • [2] Flare: Flexible In-Network Allreduce
    De Sensi, Daniele
    Di Girolamo, Salvatore
    Ashkboos, Saleh
    Li, Shigang
    Hoefler, Torsten
    [J]. SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
  • [3] Canary: Congestion-aware in-network allreduce using dynamic trees
    De Sensi, Daniele
    Molero, Edgar Costa
    Di Girolamo, Salvatore
    Vanbever, Laurent
    Hoefler, Torsten
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 152 : 70 - 82
  • [4] Roar: A Router Microarchitecture for In-network Allreduce
    Wang, Ruiqi
    Dong, Dezun
    Lei, Fei
    Wu, Ke
    Ma, Junchao
    Lu, Kai
    [J]. PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023, 2023, : 423 - 436
  • [5] Accelerating Allreduce With In-Network Reduction on Intel PIUMA
    Lakhotia, Kartik
    Petrini, Fabrizio
    Kannan, Rajgopal
    Prasanna, Viktor
    [J]. IEEE MICRO, 2022, 42 (02) : 44 - 52
  • [6] Efficient Inter-Datacenter AllReduce With Multiple Trees
    Luo, Shouxi
    Wang, Renyi
    Xing, Huanlai
    [J]. IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (05): : 4793 - 4806
  • [7] Scalable Fully Pipelined Hardware Architecture for In-Network Aggregated AllReduce Communication
    Liu, Yao
    Zhang, Junyi
    Liu, Shuo
    Wang, Qiaoling
    Dai, Wangchen
    Cheung, Ray Chak Chung
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (10) : 4194 - 4206
  • [8] Broadcast with Tree Selection from Multiple Spanning Trees on an Overlay Network
    Kaneko, Takeshi
    Shudo, Kazuyuki
    [J]. IEICE TRANSACTIONS ON COMMUNICATIONS, 2023, E106B (02) : 145 - 155
  • [9] Energy balanced in-network aggregation using multiple trees in wireless sensor networks
    Lee, Byoungyong
    Park, Kyungseo
    Elmasri, Ramez
    [J]. 2007 4TH IEEE CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE, VOLS 1-3, 2007, : 530 - 534
  • [10] MVSink: Incrementally Building In-Network Aggregation Trees
    Fernandes, Leonardo L.
    Murphy, Amy L.
    [J]. WIRELESS SENSOR NETWORKS, PROCEEDINGS, 2009, 5432 : 216 - +