Releasing the Power of In-Network Aggregation With Aggregator-Aware Routing Optimization

被引:1
|
作者
Luo, Shouxi [1 ,2 ]
Yu, Xiaoyu [1 ,2 ]
Li, Ke [1 ,2 ]
Xing, Huanlai [1 ,2 ]
机构
[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China
[2] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transpo, Chengdu 611756, Peoples R China
关键词
Distributed machine learning; in-network aggregation; routing optimization; programmable switches;
D O I
10.1109/TNET.2024.3423380
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
By offloading partial of the aggregation computation from the logical central parameter servers to network devices like programmable switches, In-Network Aggregation (INA) is a general, effective, and widely used approach to reduce network load thus alleviating the communication bottlenecks suffered by large-scale distributed training. Given the fact that INA would take effects if and only if associated traffic goes through the same in-network aggregator, the key to taking advantage of INA lies in routing control. However, existing proposals fall short in doing so and thus are far from optimal, since they select routes for INA-supported traffic without comprehensively considering the characteristics, limitations, and requirements of the network environment, aggregator hardware, and distributed training jobs. To fill the gap, in this paper, we systematically establish a mathematical model to formulate i) the up-down routing constraints of Clos datacenter networks, ii) the limitations raised by modern programmable switches' pipeline hardware structure, and iii) the various aggregator-aware routing optimization goals required by distributed training tasks under different parallelism strategies. Based on the model, we develop, an Aggregator-aware Routing Optimization solution for INA-accelerated distributed training applications. To be efficient, involves a suite of search space pruning designs, by using the model's characteristics, yielding tens of times improvement in the solving time with trivial performance loss. Extensive experiments show that is able to find near-optimal results for large-scale routing optimization in tens of seconds, achieving 1.8 similar to 4.0 x higher throughput than the state-of-the-art solution.
引用
收藏
页码:4488 / 4502
页数:15
相关论文
共 50 条
  • [1] A2TP: Aggregator-aware In-network Aggregation for Multi-tenant Learning
    Li, Zhaoyi
    Huang, Jiawei
    Li, Yijun
    Xu, Aikun
    Zhou, Shengwen
    Liu, Jingling
    Wang, Jianxin
    [J]. PROCEEDINGS OF THE EIGHTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, EUROSYS 2023, 2023, : 639 - 653
  • [2] Determining the routing path for in-network aggregation
    Zhao, Xiwei
    Makki, S. Kami
    Pissinou, Niki
    [J]. 2006 INTERNATIONAL CONFERENCE ON HYBRID INFORMATION TECHNOLOGY, VOL 2, PROCEEDINGS, 2006, : 318 - +
  • [3] Fuzzy routing for in-network aggregation in wireless sensor networks
    Maivizhi, Radhakrishnan
    Yogesh, Palanichamy
    [J]. PEER-TO-PEER NETWORKING AND APPLICATIONS, 2022, 15 (01) : 592 - 611
  • [4] In-network event routing approach based on aggregation ring
    School of Software, Central South University, Changsha
    410083, China
    不详
    410083, China
    [J]. Zhongnan Daxue Xuebao (Ziran Kexue Ban), 11 (4100-4107):
  • [5] GRID: Gradient Routing With In-Network Aggregation for Distributed Training
    Fang, Jin
    Zhao, Gongming
    Xu, Hongli
    Wu, Changbo
    Yu, Zhuolong
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (05) : 2267 - 2280
  • [6] Fuzzy routing for in-network aggregation in wireless sensor networks
    Radhakrishnan Maivizhi
    Palanichamy Yogesh
    [J]. Peer-to-Peer Networking and Applications, 2022, 15 : 592 - 611
  • [7] In-Network AllReduce Optimization with Virtual Aggregation Trees
    Song, Haoyu
    [J]. PROCEEDINGS OF THE 2024 SIGCOMM WORKSHOP ON NETWORKS FOR AI COMPUTING, NAIC 2024, 2024, : 54 - 60
  • [8] ACU: Aggregator-based Congestion Control and Link Utilization Optimization Strategy for Multi-tenant In-network Aggregation
    Yuan, Zhu
    Yuan, Guoyuan
    Dong, Dezun
    [J]. PROCEEDINGS OF THE 8TH ASIA-PACIFIC WORKSHOP ON NETWORKING, APNET 2024, 2024, : 194 - 195
  • [9] Trust-Aware In-Network Aggregation for Wireless Sensor Networks
    Deng, Hongmei
    Jin, Guang
    Sun, Kun
    Xu, Roger
    Lyell, Margaret
    Luke, Jahn A.
    [J]. GLOBECOM 2009 - 2009 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, VOLS 1-8, 2009, : 5888 - 5895
  • [10] A Study On Routing Approach For In-Network Aggregation In Wireless Sensor Networks
    Sudha, S.
    Manimegalai, B.
    Thirumoorthy, P.
    [J]. 2014 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2014,