Multiagent Reinforcement Learning With Unshared Value Functions

被引:43
|
作者
Hu, Yujing [1 ]
Gao, Yang [1 ]
An, Bo [2 ]
机构
[1] Nanjing Univ, Dept Comp Sci, State Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
基金
美国国家科学基金会;
关键词
Game theory; multiagent reinforcement learning; Nash equilibrium; negotiation; AUTOMATA;
D O I
10.1109/TCYB.2014.2332042
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One important approach of multiagent reinforcement learning (MARL) is equilibrium-based MARL, which is a combination of reinforcement learning and game theory. Most existing algorithms involve computationally expensive calculation of mixed strategy equilibria and require agents to replicate the other agents' value functions for equilibrium computing in each state. This is unrealistic since agents may not be willing to share such information due to privacy or safety concerns. This paper aims to develop novel and efficient MARL algorithms without the need for agents to share value functions. First, we adopt pure strategy equilibrium solution concepts instead of mixed strategy equilibria given that a mixed strategy equilibrium is often computationally expensive. In this paper, three types of pure strategy profiles are utilized as equilibrium solution concepts: pure strategy Nash equilibrium, equilibrium-dominating strategy profile, and nonstrict equilibrium-dominating strategy profile. The latter two solution concepts are strategy profiles from which agents can gain higher payoffs than one or more pure strategy Nash equilibria. Theoretical analysis shows that these strategy profiles are symmetric meta equilibria. Second, we propose a multi-step negotiation process for finding pure strategy equilibria since value functions are not shared among agents. By putting these together, we propose a novel MARL algorithm called negotiation-based Q-learning (NegoQ). Experiments are first conducted in grid-world games, which are widely used to evaluate MARL algorithms. In these games, NegoQ learns equilibrium policies and runs significantly faster than existing MARL algorithms (correlated Q-learning and Nash Q-learning). Surprisingly, we find that NegoQ also performs well in team Markov games such as pursuit games, as compared with team-task-oriented MARL algorithms (such as friend Q-learning and distributed Q-learning).
引用
收藏
页码:647 / 662
页数:16
相关论文
共 50 条
  • [31] A comprehensive survey of multiagent reinforcement learning
    Busoniu, Lucian
    Babuska, Robert
    De Schutter, Bart
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02): : 156 - 172
  • [32] Multiagent Reinforcement Learning in Traffic and Transportation
    Bazzan, Ana
    [J]. 2014 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN VEHICLES AND TRANSPORTATION SYSTEMS (CIVTS), 2014, : VII - VII
  • [33] Distributed Neural Learning Algorithms for Multiagent Reinforcement Learning
    Dai, Pengcheng
    Liu, Hongzhe
    Yu, Wenwu
    Wang, He
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (23) : 21039 - 21060
  • [34] WToE: Learning When to Explore in Multiagent Reinforcement Learning
    Dong, Shaokang
    Mao, Hangyu
    Yang, Shangdong
    Zhu, Shengyu
    Li, Wenbin
    Hao, Jianye
    Gao, Yang
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (08) : 4789 - 4801
  • [35] A survey on transfer learning for multiagent reinforcement learning systems
    Da Silva, Felipe Leno
    Reali Costa, Anna Helena
    [J]. Journal of Artificial Intelligence Research, 2019, 64 : 645 - 703
  • [36] Accelerating Multiagent Reinforcement Learning through Transfer Learning
    da Silva, Felipe Leno
    Reali Costa, Anna Helena
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 5034 - 5035
  • [37] An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning
    Yang, Tianpei
    Wang, Weixun
    Tang, Hongyao
    Hao, Jianye
    Meng, Zhaopeng
    Mao, Hangyu
    Li, Dong
    Liu, Wulong
    Zhang, Chengwei
    Hu, Yujing
    Chen, Yingfeng
    Fan, Changjie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [38] A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems
    Da Silva, Felipe Leno
    Reali Costa, Anna Helena
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2019, 64 : 645 - 703
  • [39] Coordination in multiagent reinforcement learning systems by virtual reinforcement signals
    Kamal, M.
    Murata, Junichi
    [J]. INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2007, 11 (03) : 181 - 191
  • [40] Multiagent Reinforcement Social Learning toward Coordination in Cooperative Multiagent Systems
    Hao, Jianye
    Leung, Ho-Fung
    Ming, Zhong
    [J]. ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2015, 9 (04)