I2Q: A Fully Decentralized Q-Learning Algorithm

被引:0
|
作者
Jiang, Jiechuan [1 ]
Lu, Zongqing [1 ]
机构
[1] Peking Univ, Sch Comp Sci, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fully decentralized multi-agent reinforcement learning has shown great potential for many real-world cooperative tasks, where the global information, e.g., the actions of other agents, is not accessible. Although independent Q-learning is widely used for decentralized training, the transition probabilities are non-stationary since other agents are updating policies simultaneously, which leads to non-guaranteed convergence of independent Q-learning. To deal with non-stationarity, we first introduce stationary ideal transition probabilities, on which independent Q-learning could converge to the global optimum. Further, we propose a fully decentralized method, I2Q, which performs independent Q-learning on the modeled ideal transition function to reach the global optimum. The modeling of ideal transition function in I2Q is fully decentralized and independent from the learned policies of other agents, helping I2Q be free from non-stationarity and learn the optimal policy. Empirically, we show that I2Q can achieve remarkable improvement in a variety of cooperative multi-agent tasks.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Mutual Q-learning
    Reid, Cameron
    Mukhopadhyay, Snehasis
    2020 3RD INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTS (ICCR 2020), 2020, : 128 - 133
  • [42] Robust Q-Learning
    Ertefaie, Ashkan
    McKay, James R.
    Oslin, David
    Strawderman, Robert L.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (533) : 368 - 381
  • [43] Neural Q-learning
    Stephan ten Hagen
    Ben Kröse
    Neural Computing & Applications, 2003, 12 : 81 - 88
  • [44] Neural Q-learning
    ten Hagen, S
    Kröse, B
    NEURAL COMPUTING & APPLICATIONS, 2003, 12 (02): : 81 - 88
  • [45] Logistic Q-Learning
    Bas-Serrano, Joan
    Curi, Sebastian
    Krause, Andreas
    Neu, Gergely
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [46] Learning rates for Q-Learning
    Even-Dar, E
    Mansour, Y
    COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604
  • [47] Learning rates for Q-learning
    Even-Dar, E
    Mansour, Y
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
  • [48] Decentralized Cognitive MAC Protocol Design Based on POMDP and Q-Learning
    Lan, Zhongli
    Jiang, Hong
    Wu, Xiaoli
    2012 7TH INTERNATIONAL ICST CONFERENCE ON COMMUNICATIONS AND NETWORKING IN CHINA (CHINACOM), 2012, : 548 - 551
  • [49] Application of Q-Learning algorithm for Traveling Salesman Problem
    Hasegawa, N
    Li, L
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON INFORMATION AND MANAGEMENT SCIENCES, 2002, 2 : 134 - 138
  • [50] A Task Scheduling Algorithm Based on Q-Learning for WSNs
    Zhang, Benhong
    Wu, Wensheng
    Bi, Xiang
    Wang, Yiming
    COMMUNICATIONS AND NETWORKING, CHINACOM 2018, 2019, 262 : 521 - 530