I2Q: A Fully Decentralized Q-Learning Algorithm

被引:0
|
作者
Jiang, Jiechuan [1 ]
Lu, Zongqing [1 ]
机构
[1] Peking Univ, Sch Comp Sci, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fully decentralized multi-agent reinforcement learning has shown great potential for many real-world cooperative tasks, where the global information, e.g., the actions of other agents, is not accessible. Although independent Q-learning is widely used for decentralized training, the transition probabilities are non-stationary since other agents are updating policies simultaneously, which leads to non-guaranteed convergence of independent Q-learning. To deal with non-stationarity, we first introduce stationary ideal transition probabilities, on which independent Q-learning could converge to the global optimum. Further, we propose a fully decentralized method, I2Q, which performs independent Q-learning on the modeled ideal transition function to reach the global optimum. The modeling of ideal transition function in I2Q is fully decentralized and independent from the learned policies of other agents, helping I2Q be free from non-stationarity and learn the optimal policy. Empirically, we show that I2Q can achieve remarkable improvement in a variety of cooperative multi-agent tasks.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Autonomous Decentralized Traffic Control Using Q-Learning in LPWAN
    Kaburaki, Aoto
    Adachi, Koichi
    Takyu, Osamu
    Ohta, Mai
    Fujii, Takeo
    IEEE ACCESS, 2021, 9 : 93651 - 93661
  • [22] An ARM-based Q-learning algorithm
    Hsu, Yuan-Pao
    Hwang, Kao-Shing
    Lin, Hsin-Yi
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF CONTEMPORARY INTELLIGENT COMPUTING TECHNIQUES, 2007, 2 : 11 - +
  • [23] Implications of Decentralized Q-learning Resource Allocation in Wireless Networks
    Wilhelmi, Francesc
    Bellalta, Boris
    Cano, Cristina
    Jonsson, Anders
    2017 IEEE 28TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR, AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2017,
  • [24] Decentralized Q-Learning in Zero-sum Markov Games
    Sayin, Muhammed O.
    Zhang, Kaiqing
    Leslie, David S.
    Sar, Tamer Ba Comma
    Ozdaglar, Asuman
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [25] Sample Complexity of Decentralized Tabular Q-Learning for Stochastic Games
    Gao, Zuguang
    Ma, Qianqian
    Basar, Tamer
    Birge, John R.
    2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 1098 - 1103
  • [26] Decentralized Q-Learning for Weakly Acyclic Stochastic Dynamic Games
    Arslan, Gurdal
    Yuksel, Serdar
    2015 54TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2015, : 6743 - 6748
  • [27] Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games
    Amhraoui, Elmehdi
    Masrour, Tawfik
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (07) : 2781 - 2797
  • [28] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
    Ohnishi, Shota
    Uchibe, Eiji
    Yamaguchi, Yotaro
    Nakanishi, Kosuke
    Yasui, Yuji
    Ishii, Shin
    FRONTIERS IN NEUROROBOTICS, 2019, 13
  • [29] Reinforcement Learning-Based Multihop Relaying: A Decentralized Q-Learning Approach
    Wang, Xiaowei
    Wang, Xin
    ENTROPY, 2021, 23 (10)
  • [30] An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
    Spano, Sergio
    Cardarilli, Gian Carlo
    Di Nunzio, Luca
    Fazzolari, Rocco
    Giardino, Daniele
    Matta, Marco
    Nannarelli, Alberto
    Re, Marco
    IEEE ACCESS, 2019, 7 : 186340 - 186351