Releasing the Power of In-Network Aggregation With Aggregator-Aware Routing Optimization

被引:1
|
作者
Luo, Shouxi [1 ,2 ]
Yu, Xiaoyu [1 ,2 ]
Li, Ke [1 ,2 ]
Xing, Huanlai [1 ,2 ]
机构
[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China
[2] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transpo, Chengdu 611756, Peoples R China
关键词
Distributed machine learning; in-network aggregation; routing optimization; programmable switches;
D O I
10.1109/TNET.2024.3423380
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
By offloading partial of the aggregation computation from the logical central parameter servers to network devices like programmable switches, In-Network Aggregation (INA) is a general, effective, and widely used approach to reduce network load thus alleviating the communication bottlenecks suffered by large-scale distributed training. Given the fact that INA would take effects if and only if associated traffic goes through the same in-network aggregator, the key to taking advantage of INA lies in routing control. However, existing proposals fall short in doing so and thus are far from optimal, since they select routes for INA-supported traffic without comprehensively considering the characteristics, limitations, and requirements of the network environment, aggregator hardware, and distributed training jobs. To fill the gap, in this paper, we systematically establish a mathematical model to formulate i) the up-down routing constraints of Clos datacenter networks, ii) the limitations raised by modern programmable switches' pipeline hardware structure, and iii) the various aggregator-aware routing optimization goals required by distributed training tasks under different parallelism strategies. Based on the model, we develop, an Aggregator-aware Routing Optimization solution for INA-accelerated distributed training applications. To be efficient, involves a suite of search space pruning designs, by using the model's characteristics, yielding tens of times improvement in the solving time with trivial performance loss. Extensive experiments show that is able to find near-optimal results for large-scale routing optimization in tens of seconds, achieving 1.8 similar to 4.0 x higher throughput than the state-of-the-art solution.
引用
收藏
页码:4488 / 4502
页数:15
相关论文
共 50 条
  • [21] Multi-Tenancy- and Redundancy-Aware In-Network Aggregation using Programmable Switches
    Han, Sol
    Lee, Hochan
    Han, Subin
    Kim, Heewon
    Pack, Sangheon
    [J]. IEEE NETWORK, 2023, 37 (03): : 94 - 100
  • [22] Uncertainty-Aware Optimization for Network Provisioning and Routing
    Bi, Yingjie
    Tang, Ao
    [J]. 2019 53RD ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2019,
  • [23] Opportunistic routing with in-network aggregation for asynchronous duty-cycled wireless sensor networks
    Jungmin So
    Heejung Byun
    [J]. Wireless Networks, 2014, 20 : 833 - 846
  • [24] Power Aware Routing Protocols in Wireless Sensor Network
    Alsultan, Mohammed
    Oztoprak, Kasim
    Hassanpour, Reza
    [J]. IEICE TRANSACTIONS ON COMMUNICATIONS, 2016, E99B (07) : 1481 - 1491
  • [25] Shuffled Shepherd Squirrel Optimization and Fractional LMS Model for In-Network Aggregation in Wireless Sensor Network
    Rajesh, L.
    Mohan, H. S.
    [J]. INTERNATIONAL JOURNAL OF BUSINESS DATA COMMUNICATIONS AND NETWORKING, 2022, 18 (01)
  • [26] ABRM: In-Network Aggregation Based Routing Protocol for Mobile Sensor Networks with Multiple Mobile Sinks
    Soliman, Maged S.
    Fahmy, Hossam M. A.
    Salem, Ashraf E.
    [J]. 2013 IEEE 27TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2013, : 340 - 347
  • [27] In-Network Pooling: Contribution-Aware Allocation Optimization for Computing Power Network in B5G/6G Era
    Di, Zheng
    Luo, Tao
    Qiu, Chao
    Zhang, Cheng
    Liu, Zhutao
    Wang, Xiaofei
    Jiang, Jing
    [J]. IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2023, 10 (03): : 1190 - 1202
  • [28] A power-aware adaptive routing scheme for network on a chip
    Yang, Sheng-guang
    Li, Li
    Xu, Yi
    Zhang, Yu-ang
    Zhang, Bing
    [J]. ASICON 2007: 2007 7TH INTERNATIONAL CONFERENCE ON ASIC, VOLS 1 AND 2, PROCEEDINGS, 2007, : 1301 - 1304
  • [29] IN-NETWORK DATA AGGREGATION VIA ANT-COLONY OPTIMIZATION IN WIRELESS SENSOR NETWORKS
    Xie, Meng
    Shi, Hongchi
    [J]. JOURNAL OF INTERCONNECTION NETWORKS, 2012, 13 (3-4)
  • [30] Ant-Colony Optimization Based In-Network Data Aggregation in Wireless Sensor Networks
    Xie, Meng
    Shi, Hongchi
    [J]. PROCEEDINGS OF THE 2012 12TH INTERNATIONAL SYMPOSIUM ON PERVASIVE SYSTEMS, ALGORITHMS, AND NETWORKS (I-SPAN 2012), 2012, : 77 - 83