I2Q: A Fully Decentralized Q-Learning Algorithm

被引：0

作者：

Jiang, Jiechuan ^{[1
]}

Lu, Zongqing ^{[1
]}

机构：

[1] Peking Univ, Sch Comp Sci, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Fully decentralized multi-agent reinforcement learning has shown great potential for many real-world cooperative tasks, where the global information, e.g., the actions of other agents, is not accessible. Although independent Q-learning is widely used for decentralized training, the transition probabilities are non-stationary since other agents are updating policies simultaneously, which leads to non-guaranteed convergence of independent Q-learning. To deal with non-stationarity, we first introduce stationary ideal transition probabilities, on which independent Q-learning could converge to the global optimum. Further, we propose a fully decentralized method, I2Q, which performs independent Q-learning on the modeled ideal transition function to reach the global optimum. The modeling of ideal transition function in I2Q is fully decentralized and independent from the learned policies of other agents, helping I2Q be free from non-stationarity and learn the optimal policy. Empirically, we show that I2Q can achieve remarkable improvement in a variety of cooperative multi-agent tasks.

引用

页数：13

共 50 条

[21] Autonomous Decentralized Traffic Control Using Q-Learning in LPWAN
Kaburaki, Aoto
Adachi, Koichi
Takyu, Osamu
Ohta, Mai
Fujii, Takeo
IEEE ACCESS, 2021, 9 : 93651 - 93661
[22] An ARM-based Q-learning algorithm
Hsu, Yuan-Pao
Hwang, Kao-Shing
Lin, Hsin-Yi
ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF CONTEMPORARY INTELLIGENT COMPUTING TECHNIQUES, 2007, 2 : 11 - +
[23] Implications of Decentralized Q-learning Resource Allocation in Wireless Networks
Wilhelmi, Francesc
Bellalta, Boris
Cano, Cristina
Jonsson, Anders
2017 IEEE 28TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR, AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2017,
[24] Decentralized Q-Learning in Zero-sum Markov Games
Sayin, Muhammed O.
Zhang, Kaiqing
Leslie, David S.
Sar, Tamer Ba Comma
Ozdaglar, Asuman
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[25] Sample Complexity of Decentralized Tabular Q-Learning for Stochastic Games
Gao, Zuguang
Ma, Qianqian
Basar, Tamer
Birge, John R.
2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 1098 - 1103
[26] Decentralized Q-Learning for Weakly Acyclic Stochastic Dynamic Games
Arslan, Gurdal
Yuksel, Serdar
2015 54TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2015, : 6743 - 6748
[27] Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games
Amhraoui, Elmehdi
Masrour, Tawfik
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (07) : 2781 - 2797
[28] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
Ohnishi, Shota
Uchibe, Eiji
Yamaguchi, Yotaro
Nakanishi, Kosuke
Yasui, Yuji
Ishii, Shin
FRONTIERS IN NEUROROBOTICS, 2019, 13
[29] Reinforcement Learning-Based Multihop Relaying: A Decentralized Q-Learning Approach
Wang, Xiaowei
Wang, Xin
ENTROPY, 2021, 23 (10)
[30] An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
Spano, Sergio
Cardarilli, Gian Carlo
Di Nunzio, Luca
Fazzolari, Rocco
Giardino, Daniele
Matta, Marco
Nannarelli, Alberto
Re, Marco
IEEE ACCESS, 2019, 7 : 186340 - 186351

← 1 2 3 4 5 →