Reinforcement Learning for Datacenter Congestion Control

被引：10

作者：

Tessler C.

Shpigelman Y.

Dalal G.

Mandelbaum A.

Haritan Kazakov D.

Fuhrer B.

Chechik G.

Mannor S.

机构：

来源：

Performance Evaluation Review | 2021年 / 49卷 / 02期

关键词：

23;

D O I：

10.1145/3512798.3512815

中图分类号：

学科分类号：

摘要：

We approach the task of network congestion control in datacenters using Reinforcement Learning (RL). Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. Evidently, the most popular recent deployments rely on rule-based heuristics that are tested on a predetermined set of benchmarks. Consequently, these heuristics do not generalize well to newly-seen scenarios. Contrarily, we devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We overcome challenges such as partial-observability, nonstationarity, and multi-objectiveness. We further propose a policy gradient algorithm that leverages the analytical structure of the reward function to approximate its derivative and improve stability. We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training. Our experiments, conducted on a realistic simulator that emulates communication networks' behavior, exhibit improved performance concurrently on the multiple considered metrics compared to the popular algorithms deployed today in real datacenters. Our algorithm is being productized to replace heuristics in some of the largest datacenters in the world. © 2022 is held by the owner/author(s).

引用

页码：43 / 46

页数：3

共 50 条

[1] Reinforcement Learning for Datacenter Congestion Control
Tessler, Chen
Shpigelman, Yuval
Dalal, Gal
Mandelbaum, Amit
Kazakov, Doron Haritan
Fuhrer, Benjamin
Chechik, Gal
Mannor, Shie
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 12615 - 12621
[2] Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs
Fuhrer, Benjamin
Shpigelman, Yuval
Tessler, Chen
Mannor, Shie
Chechik, Gal
Zahavi, Eitan
Dalal, Gal
2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID, 2023, : 331 - 343
[3] NCC: Neighbor-aware Congestion Control based on Reinforcement Learning for Datacenter Networks
Wang, Haoyu
Zheng, Kevin
Reiss, Charles
Shen, Haiying
51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
[4] QTCP: Adaptive Congestion Control with Reinforcement Learning
Li, Wei
Zhou, Fan
Chowdhury, Kaushik Roy
Meleis, Waleed
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2019, 6 (03): : 445 - 458
[5] Rax: Deep Reinforcement Learning for Congestion Control
Bachl, Maximilian
Zseby, Tanja
Fabini, Joachim
ICC 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2019,
[6] TRCC: Transferable Congestion Control With Reinforcement Learning
Zheng, Zhicong
Xia, Zhenchang
Chou, Yu-Cheng
Chen, Yanjiao
IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (02) : 2273 - 2285
[7] Glider: rethinking congestion control with deep reinforcement learning
Zhenchang Xia
Libing Wu
Fei Wang
Xudong Liao
Haiyan Hu
Jia Wu
Dan Wu
World Wide Web, 2023, 26 : 115 - 137
[8] Congestion Control for Cross-Datacenter Networks
Zeng, Gaoxiong
Bai, Wei
Chen, Ge
Chen, Kai
Han, Dongsu
Zhu, Yibo
Cui, Lei
IEEE-ACM TRANSACTIONS ON NETWORKING, 2022, 30 (05) : 2074 - 2089
[9] Glider: rethinking congestion control with deep reinforcement learning
Xia, Zhenchang
Wu, Libing
Wang, Fei
Liao, Xudong
Hu, Haiyan
Wu, Jia
Wu, Dan
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (01): : 115 - 137
[10] Pareto: Fair Congestion Control With Online Reinforcement Learning
Emara, Salma
Wang, Fei
Li, Baochun
Zeyl, Timothy
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (05): : 3731 - 3748

← 1 2 3 4 5 →