Multiagent Reinforcement Learning With Unshared Value Functions

被引:44
|
作者
Hu, Yujing [1 ]
Gao, Yang [1 ]
An, Bo [2 ]
机构
[1] Nanjing Univ, Dept Comp Sci, State Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
基金
美国国家科学基金会;
关键词
Game theory; multiagent reinforcement learning; Nash equilibrium; negotiation; AUTOMATA;
D O I
10.1109/TCYB.2014.2332042
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One important approach of multiagent reinforcement learning (MARL) is equilibrium-based MARL, which is a combination of reinforcement learning and game theory. Most existing algorithms involve computationally expensive calculation of mixed strategy equilibria and require agents to replicate the other agents' value functions for equilibrium computing in each state. This is unrealistic since agents may not be willing to share such information due to privacy or safety concerns. This paper aims to develop novel and efficient MARL algorithms without the need for agents to share value functions. First, we adopt pure strategy equilibrium solution concepts instead of mixed strategy equilibria given that a mixed strategy equilibrium is often computationally expensive. In this paper, three types of pure strategy profiles are utilized as equilibrium solution concepts: pure strategy Nash equilibrium, equilibrium-dominating strategy profile, and nonstrict equilibrium-dominating strategy profile. The latter two solution concepts are strategy profiles from which agents can gain higher payoffs than one or more pure strategy Nash equilibria. Theoretical analysis shows that these strategy profiles are symmetric meta equilibria. Second, we propose a multi-step negotiation process for finding pure strategy equilibria since value functions are not shared among agents. By putting these together, we propose a novel MARL algorithm called negotiation-based Q-learning (NegoQ). Experiments are first conducted in grid-world games, which are widely used to evaluate MARL algorithms. In these games, NegoQ learns equilibrium policies and runs significantly faster than existing MARL algorithms (correlated Q-learning and Nash Q-learning). Surprisingly, we find that NegoQ also performs well in team Markov games such as pursuit games, as compared with team-task-oriented MARL algorithms (such as friend Q-learning and distributed Q-learning).
引用
收藏
页码:647 / 662
页数:16
相关论文
共 50 条
  • [1] Multiagent reinforcement learning through merging individually learned value functions
    张化祥
    黄上腾
    [J]. Journal of Harbin Institute of Technology(New series), 2005, (03) : 346 - 350
  • [2] SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multiagent Reinforcement Learning
    Yao, Xinghu
    Wen, Chao
    Wang, Yuhui
    Tan, Xiaoyang
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (01) : 52 - 63
  • [3] Distributed Multiagent Reinforcement Learning Based on Graph-Induced Local Value Functions
    Jing, Gangshan
    Bai, He
    George, Jemin
    Chakrabortty, Aranya
    Sharma, Piyush K.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (10) : 6636 - 6651
  • [4] Policy Distillation and Value Matching in Multiagent Reinforcement Learning
    Wadhwania, Samir
    Kim, Dong-Ki
    Omidshatiei, Shayegan
    How, Jonathan P.
    [J]. 2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 8193 - 8200
  • [5] Distributed Value Function Approximation for Collaborative Multiagent Reinforcement Learning
    Stankovic, Milos S.
    Beko, Marko
    Stankovic, Srdjan S.
    [J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2021, 8 (03): : 1270 - 1280
  • [6] Multiagent value iteration algorithms in dynamic programming and reinforcement learning
    Bertsekas, Dimitri
    [J]. RESULTS IN CONTROL AND OPTIMIZATION, 2020, 1
  • [7] Composing Value Functions in Reinforcement Learning
    van Niekerk, Benjamin
    James, Steven
    Earle, Adam
    Rosman, Benjamin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [8] TVDO: Tchebycheff Value-Decomposition Optimization for Multiagent Reinforcement Learning
    Hu, Xiaoliang
    Guo, Pengcheng
    Li, Yadong
    Li, Guangyu
    Cui, Zhen
    Yang, Jian
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [9] VGN: Value Decomposition With Graph Attention Networks for Multiagent Reinforcement Learning
    Wei, Qinglai
    Li, Yugu
    Zhang, Jie
    Wang, Fei-Yue
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (01) : 182 - 195
  • [10] Sparse Approximations to Value Functions in Reinforcement Learning
    Jakab, Hunor S.
    Csato, Lehel
    [J]. ARTIFICIAL NEURAL NETWORKS, 2015, : 295 - 314