Releasing the Power of In-Network Aggregation With Aggregator-Aware Routing Optimization

被引:1
|
作者
Luo, Shouxi [1 ,2 ]
Yu, Xiaoyu [1 ,2 ]
Li, Ke [1 ,2 ]
Xing, Huanlai [1 ,2 ]
机构
[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China
[2] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transpo, Chengdu 611756, Peoples R China
关键词
Distributed machine learning; in-network aggregation; routing optimization; programmable switches;
D O I
10.1109/TNET.2024.3423380
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
By offloading partial of the aggregation computation from the logical central parameter servers to network devices like programmable switches, In-Network Aggregation (INA) is a general, effective, and widely used approach to reduce network load thus alleviating the communication bottlenecks suffered by large-scale distributed training. Given the fact that INA would take effects if and only if associated traffic goes through the same in-network aggregator, the key to taking advantage of INA lies in routing control. However, existing proposals fall short in doing so and thus are far from optimal, since they select routes for INA-supported traffic without comprehensively considering the characteristics, limitations, and requirements of the network environment, aggregator hardware, and distributed training jobs. To fill the gap, in this paper, we systematically establish a mathematical model to formulate i) the up-down routing constraints of Clos datacenter networks, ii) the limitations raised by modern programmable switches' pipeline hardware structure, and iii) the various aggregator-aware routing optimization goals required by distributed training tasks under different parallelism strategies. Based on the model, we develop, an Aggregator-aware Routing Optimization solution for INA-accelerated distributed training applications. To be efficient, involves a suite of search space pruning designs, by using the model's characteristics, yielding tens of times improvement in the solving time with trivial performance loss. Extensive experiments show that is able to find near-optimal results for large-scale routing optimization in tens of seconds, achieving 1.8 similar to 4.0 x higher throughput than the state-of-the-art solution.
引用
收藏
页码:4488 / 4502
页数:15
相关论文
共 50 条
  • [11] PARING: Joint Task Placement and Routing for Distributed Training With In-Network Aggregation
    Qiu, Yuhang
    Zhao, Gongming
    Xu, Hongli
    Huang, He
    Qiao, Chunming
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, : 4317 - 4332
  • [12] Straggler-Aware In-Network Aggregation for Accelerating Distributed Deep Learning
    Lee, Hochan
    Lee, Jaewook
    Kim, Heewon
    Pack, Sangheon
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (06) : 4198 - 4204
  • [13] Opportunistic routing with in-network aggregation for duty-cycled WSNs with delay requirements
    Jungmin So
    Heejung Byun
    [J]. EURASIP Journal on Wireless Communications and Networking, 2014
  • [14] Q-learning based routing for in-network aggregation in wireless sensor networks
    Radhakrishnan Maivizhi
    Palanichamy Yogesh
    [J]. Wireless Networks, 2021, 27 : 2231 - 2250
  • [15] Opportunistic routing with in-network aggregation for duty-cycled WSNs with delay requirements
    So, Jungmin
    Byun, Heejung
    [J]. EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2014,
  • [16] DRINA: A Lightweight and Reliable Routing Approach for In-Network Aggregation in Wireless Sensor Networks
    Villas, Leandro Aparecido
    Boukerche, Azzedine
    Ramos, Heitor Soares
    Fernandes de Oliveira, Horacio A. B.
    de Araujo, Regina Borges
    Ferreira Loureiro, Antonio Alfredo
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (04) : 676 - 689
  • [17] Q-learning based routing for in-network aggregation in wireless sensor networks
    Maivizhi, Radhakrishnan
    Yogesh, Palanichamy
    [J]. WIRELESS NETWORKS, 2021, 27 (03) : 2231 - 2250
  • [18] Power aware routing in wireless sensor network
    Sahoo, Rajesh
    Das, Satyabrata
    Mohapatra, D.P.
    Patra, M.R.
    [J]. International Journal of Computer Science Issues, 2011, 8 (3 3-1): : 602 - 610
  • [19] Request routing through collaborative in-network caching for bandwidth optimization: a methodology
    Xu, Yuemei
    Wang, Zihou
    Li, Yang
    Chen, Fu
    Lin, Tao
    Niu, Wenjia
    [J]. TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2017, 28 (01):
  • [20] Opportunistic routing with in-network aggregation for asynchronous duty-cycled wireless sensor networks
    So, Jungmin
    Byun, Heejung
    [J]. WIRELESS NETWORKS, 2014, 20 (05) : 833 - 846