Switches for HIRE: Resource Scheduling for Data Center In-Network Computing

被引:17
|
作者
Bloecher, Marcel [1 ]
Wang, Lin [1 ,2 ]
Eugster, Patrick [3 ,4 ]
Schmidt, Max [1 ]
机构
[1] Tech Univ Darmstadt, Darmstadt, Germany
[2] Vrije Univ Amsterdam, Amsterdam, Netherlands
[3] USI Lugano, Lugano, Switzerland
[4] Purdue Univ, W Lafayette, IN 47907 USA
基金
瑞士国家科学基金会; 欧洲研究理事会; 美国国家科学基金会;
关键词
data center; scheduling; in-network computing; heterogeneity; nonlinear resource usage;
D O I
10.1145/3445814.3446760
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The recent trend towards more programmable switching hardware in data centers opens up new possibilities for distributed applications to leverage in-network computing (INC). Literature so far has largely focused on individual application scenarios of INC, leaving aside the problem of coordinating usage of potentially scarce and heterogeneous switch resources among multiple INC scenarios, applications, and users. The traditional model of resource pools of isolated compute containers does not fit an INC-enabled data center. This paper describes HIRE, a Holistic INC-aware Resource managEr which allows for server-local and INC resources to be coordinated in a unified manner. HIRE introduces a novel flexible resource (meta-)model to address heterogeneity, resource interchangeability, and non-linear resource requirements, and integrates dependencies between resources and locations in a unified cost model, cast as a min-cost max-flow problem. In absence of prior work, we compare HIRE against variants of state-of-the-art schedulers retrofitted to handle INC requests. Experiments with a workload trace of a 4000 machine cluster show that HIRE makes better use of INC resources by serving 8- 30% more INC requests, while at the same time reducing network detours by 20%, and reducing tail placement latency by 50%.
引用
收藏
页码:268 / 285
页数:18
相关论文
共 50 条
  • [1] Holistic Resource Scheduling for Data Center In-Network Computing
    Bloecher, Marcel
    Wang, Lin
    Eugster, Patrick
    Schmidt, Max
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2022, 30 (06) : 2448 - 2463
  • [2] Leveraging In-Network Computing and Programmable Switches for Streaming Analysis of Scientific Data
    Sankaran, Ganesh C.
    Chung, Joaquin
    Kettimuthu, Raj
    [J]. PROCEEDINGS OF THE 2021 IEEE 7TH INTERNATIONAL CONFERENCE ON NETWORK SOFTWARIZATION (NETSOFT 2021): ACCELERATING NETWORK SOFTWARIZATION IN THE COGNITIVE AGE, 2021, : 293 - 297
  • [3] Utilizing In-Network Buffering for Scheduling and Routing in Data Center Networks
    Luo, Jingjing
    Chen, Yi
    Wong, Wing Shing
    [J]. PROCEEDINGS OF THE 2018 THE NINETEENTH INTERNATIONAL SYMPOSIUM ON MOBILE AD HOC NETWORKING AND COMPUTING (MOBIHOC '18), 2018, : 312 - 313
  • [4] ClickINC: In-network Computing as a Service in Heterogeneous Programmable Data-center Networks
    Xu, Wenquan
    Zhang, Zijian
    Feng, Yong
    Song, Haoyu
    Chen, Zhikang
    Wu, Wenfei
    Liu, Guyue
    Zhang, Yinchao
    Liu, Shuxin
    Tian, Zerui
    Liu, Bin
    [J]. PROCEEDINGS OF THE 2023 ACM SIGCOMM 2023 CONFERENCE, SIGCOMM 2023, 2023, : 798 - 815
  • [5] Resource Aware Packet Scheduling for Multi-resource In-network Nodes
    Wang, Chunguang
    Wu, Qingbo
    Tan, Yusong
    Ma, Wenqi
    Wu, Quanyuan
    [J]. 2013 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2013, : 806 - 810
  • [6] In-network aggregation for data center networks: A survey
    Feng, Aoxiang
    Dong, Dezun
    Lei, Fei
    Ma, Junchao
    Yu, Enda
    Wang, Ruiqi
    [J]. COMPUTER COMMUNICATIONS, 2023, 198 : 63 - 76
  • [7] CollaSFC: An Intelligent Collaborative Approach for In-network SFC Failure Detection in Data Center for AI Computing
    Guo, Kuo
    Chen, Jia
    Xu, Qi
    Song, Fei
    Huang, Xu
    Liu, Shang
    Qian, Dongsheng
    Zhu, Jun
    Zhang, Ruyun
    Long, Keping
    [J]. PROCEEDINGS OF THE 2024 SIGCOMM WORKSHOP ON NETWORKS FOR AI COMPUTING, NAIC 2024, 2024, : 41 - 47
  • [8] Network Resource Management and Scheduling in Grid Computing
    Liu Feng
    Guo Weiwei
    Zhao Xiaomin
    [J]. 2018 INTERNATIONAL CONFERENCE ON ROBOTS & INTELLIGENT SYSTEM (ICRIS 2018), 2018, : 207 - 210
  • [9] A Novel Data Center Network Architecture with Zero in-Network Queuing
    Javidi, Tara
    Wang, Chang-Heng
    Akta, Tugcan
    [J]. 2015 13TH INTERNATIONAL SYMPOSIUM ON MODELING AND OPTIMIZATION IN MOBILE, AD HOC, AND WIRELESS NETWORKS (WIOPT), 2015, : 229 - 234
  • [10] Joint Computing and Network Resource Scheduling in a Lambda Grid Network
    Lakshmiraman, Vaidhehi
    Ramamurthy, Byrav
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-8, 2009, : 2482 - 2486