Receiver-Driven Congestion Control for InfiniBand

被引:2
|
作者
Zhang, Yiran [1 ]
Qian, Kun [2 ]
Ren, Fengyuan [1 ]
机构
[1] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol BNRis, Beijing, Peoples R China
[2] Alibaba Inc, Hangzhou, Peoples R China
关键词
InfiniBand; congestion control; MANAGEMENT;
D O I
10.1145/3472456.3472466
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
InfiniBand (IB) has become one of the most popular high-speed interconnects in High Performance Computing (HPC). The backpressure effect of credit-based link-layer flow control in IB introduces congestion spreading, which increases queueing delay and hurts application completion time. IB congestion control (IB CC) has been defined in IB specification to address the congestion spreading problem. Nowadays, HPC clusters are increasingly being used to run diverse workloads with a shared network infrastructure. The coexistence of messages transfers of different applications imposes great challenges to IB CC. In this paper, we re-exam IB CC through fine-grained experimental observations and reveal several fundamental problems. Inspired by our understanding and insights, we present a new receiver-driven congestion control for InfiniBand (RR CC). RR CC includes two key mechanisms: receiver-driven congestion identification and receiver-driven rate regulation, which empower eliminating both in-network congestion and endpoint congestion in one control loop. RR CC has much fewer parameters and requires no modifications to InfiniBand switches. Evaluations show that RR CC achieves better average/tail message latency and link utilization than IB CC under various scenarios.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Receiver-driven layered multicast using active networks
    Cheng, LC
    Ito, MR
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 501 - 504
  • [32] Poster: Receiver-Driven Semi-Broadcast for Vehicular Applications
    Kim, Dohyung
    Lee, Tae-Jin
    Yeom, Ikjun
    CARSYS'17: PROCEEDINGS OF THE 2ND ACM INTERNATIONAL WORKSHOP ON SMART, AUTONOMOUS, AND CONNECTED VEHICULAR SYSTEMS AND SERVICES, 2017, : 75 - 76
  • [33] Adaptive receiver-driven streaming from multiple senders
    Magharei, Nazanin
    Rejaie, Reza
    MULTIMEDIA SYSTEMS, 2006, 11 (06) : 550 - 567
  • [34] Fast-response receiver-driven layered multicast
    Chiu, HS
    Yeung, KL
    ISCC2004: NINTH INTERNATIONAL SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2004, : 1032 - 1037
  • [35] Context-aware Receiver-driven Retransmission Control in Wireless Local Area Networks
    Kliazovich, Dzmitry
    Ben Halima, Nadhir
    Granelli, Fabrizio
    2009 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-8, 2009, : 1147 - 1151
  • [36] Exploring the Scope of the InfiniBand Congestion Control Mechanism
    Gran, Ernst Gunnar
    Reinemo, Sven-Arne
    Lysne, Olav
    Skeie, Tor
    Zahavi, Eitan
    Shainer, Gilad
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 1131 - 1143
  • [37] A receiver-driven adaptive mechanism based on the popularity of scalable sessions
    Mendes, P
    Schulzrinne, H
    Monteiro, E
    FROM QOS PROVISIONING TO QOS CHARGING, PROCEEDINGS, 2002, 2511 : 15 - 24
  • [38] Performance Improvement by Collision Avoidance of Control Packets in Receiver-Driven Multihop Wireless Mesh Networks
    Hayamizu, Tadashi
    Kominami, Daichi
    Sugano, Masashi
    Murata, Masayuki
    Hatauchi, Takaaki
    9TH IEEE INTERNATIONAL CONFERENCE ON MOBILE AD-HOC AND SENSOR SYSTEMS (MASS 2012), 2012, : 473 - +
  • [39] WIP: Leveraging QUIC for a Receiver-driven BBR for Cellular Networks
    Haile, Habtegebreil
    Grinnemo, Karl-Johann
    Ferlin, Simone
    Hurtig, Per
    Brunstrom, Anna
    2021 IEEE 22ND INTERNATIONAL SYMPOSIUM ON A WORLD OF WIRELESS, MOBILE AND MULTIMEDIA NETWORKS (WOWMOM 2021), 2021, : 252 - 255
  • [40] Asynchronous Receiver-Driven Replay for Local Rollback of MPI Applications
    Losada, Nuria
    Bouteiller, Aurelien
    Bosilca, George
    PROCEEDINGS OF FTXS 2019: IEEE/ACM 9TH WORKSHOP ON FAULT TOLERANCE FOR HPC AT EXTREME SCALE (FTXS), 2019, : 1 - 10