Learning in Bi-level Markov Games

被引:0
|
作者
Meng, Linghui [1 ,2 ]
Ruan, Jingqing [1 ,2 ]
Xing, Dengpeng [1 ]
Xu, Bo [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
关键词
Reinforcement Learning; Multi-Agent System; Leader-Follower;
D O I
10.1109/IJCNN55064.2022.9892747
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although multi-agent reinforcement learning (MARL) has demonstrated remarkable progress in tackling sophisticated cooperative tasks, the assumption that agents take simultaneous actions still limits the applicability of MARL for many real-world problems. In this work, we relax the assumption by proposing the framework of the bi-level Markov game (BMG). BMG breaks the simultaneity by assigning two players with a leader-follower relationship in which the leader considers the policy of the follower who is taking the best response based on the leader's actions. We propose two provably convergent algorithms to solve BMG: BMG-1 and BMG-2. The former uses the standard Q-learning, while the latter relieves solving the local Stackelberg equilibrium in BMG-1 with the further two-step transition to estimate the state value. For both methods, we consider temporal difference learning techniques with both tabular and neural network representations. To verify the effectiveness of our BMG framework, we test on a series of games, including Seeker, Cooperative Navigation, and Football, that are challenging to existing MARL solvers find challenging to solve: Seeker, Cooperative Navigation, and Football. Experimental results show that our BMG methods achieve competitive advantages in terms of better performance and lower variance.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] TOPSIS for bi-level MODM problems
    Baky, Ibrahim A.
    Abo-Sinna, Mahmoud A.
    APPLIED MATHEMATICAL MODELLING, 2013, 37 (03) : 1004 - 1015
  • [42] Unity of consciousness and bi-level externalism
    Kobes, BW
    MIND & LANGUAGE, 2000, 15 (05) : 528 - 544
  • [43] Bi-level Protected Compressive Sampling
    Zhang, Leo Yu
    Wong, Kwok-Wo
    Zhang, Yushu
    Zhou, Jiantao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (09) : 1720 - 1732
  • [44] On behalf of a bi-level account of trust
    J. Adam Carter
    Philosophical Studies, 2020, 177 : 2299 - 2322
  • [45] A bi-level optimization for an HVAC system
    Luping Zhuang
    Xi Chen
    Xiaohong Guan
    Cluster Computing, 2017, 20 : 3237 - 3249
  • [46] The analysis of bi-level evolutionary graphs
    Zhang, Pei-ai
    Nie, Pu-yan
    Hu, Dai-qiang
    Zou, Fei-yan
    BIOSYSTEMS, 2007, 90 (03) : 897 - 902
  • [47] Image bi-level thresholding with the ShapeMap
    College of Computer Science, Chongqing University, Chongqing 400044, China
    J. Comput. Inf. Syst., 2008, 5 (2273-2281):
  • [48] A bi-level optimization for an HVAC system
    Zhuang, Luping
    Chen, Xi
    Guan, Xiaohong
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (04): : 3237 - 3249
  • [49] Evolutionary multitasking in bi-level optimization
    Abhishek Gupta
    Jacek Mańdziuk
    Yew-Soon Ong
    Complex & Intelligent Systems, 2015, 1 (1-4) : 83 - 95
  • [50] Bi-Level Optimization in a Transport Network
    Stoilov, Todor
    Stoilova, Krasimira
    Papageorgiou, Markos
    Papamichail, Ioannis
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2015, 15 (05) : 37 - 49