Learning in Bi-level Markov Games

被引:0
|
作者
Meng, Linghui [1 ,2 ]
Ruan, Jingqing [1 ,2 ]
Xing, Dengpeng [1 ]
Xu, Bo [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
关键词
Reinforcement Learning; Multi-Agent System; Leader-Follower;
D O I
10.1109/IJCNN55064.2022.9892747
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although multi-agent reinforcement learning (MARL) has demonstrated remarkable progress in tackling sophisticated cooperative tasks, the assumption that agents take simultaneous actions still limits the applicability of MARL for many real-world problems. In this work, we relax the assumption by proposing the framework of the bi-level Markov game (BMG). BMG breaks the simultaneity by assigning two players with a leader-follower relationship in which the leader considers the policy of the follower who is taking the best response based on the leader's actions. We propose two provably convergent algorithms to solve BMG: BMG-1 and BMG-2. The former uses the standard Q-learning, while the latter relieves solving the local Stackelberg equilibrium in BMG-1 with the further two-step transition to estimate the state value. For both methods, we consider temporal difference learning techniques with both tabular and neural network representations. To verify the effectiveness of our BMG framework, we test on a series of games, including Seeker, Cooperative Navigation, and Football, that are challenging to existing MARL solvers find challenging to solve: Seeker, Cooperative Navigation, and Football. Experimental results show that our BMG methods achieve competitive advantages in terms of better performance and lower variance.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Bi-Level Image Compression Estimating the Markov Order of Dependencies
    Alcaraz-Corona, Sergio
    Rodriguez-Dagnino, Ramon M.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (03) : 605 - 611
  • [2] On decentralized computation of the leader’s strategy in bi-level games
    Maljkovic, Marko
    Nilsson, Gustav
    Geroliminis, Nikolas
    arXiv,
  • [3] Learning Intrinsic Rewards as a Bi-Level Optimization Problem
    Zhang, Lunjun
    Stadie, Bradly C.
    Ba, Jimmy
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 111 - 120
  • [4] Provable Representation Learning for Imitation Learning via Bi-level Optimization
    Arora, Sanjeev
    Du, Simon S.
    Kakade, Sham
    Luo, Yuping
    Saunshi, Nikunj
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [5] Contingency Planning Using Bi-level Markov Decision Processes for Space Missions
    Banerjee, Somrita
    Balaban, Edward
    Shirley, Mark
    Bradner, Kevin
    Pavone, Marco
    2024 IEEE AEROSPACE CONFERENCE, 2024,
  • [6] Bi-Level Associative Classifier Using Automatic Learning on Rules
    Sood, Nitakshi
    Bindra, Leepakshi
    Zaiane, Osmar
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2020, PT I, 2020, 12391 : 201 - 216
  • [7] ECLB: Efficient contrastive learning on bi-level for noisy labels
    Guan, Juwei
    Liu, Jiaxiang
    Huang, Shuying
    Yang, Yong
    KNOWLEDGE-BASED SYSTEMS, 2024, 300
  • [8] Bi-Level Graph Structure Learning for Next POI Recommendation
    Wang L.
    Wu S.
    Liu Q.
    Zhu Y.
    Tao X.
    Zhang M.
    Wang L.
    IEEE Transactions on Knowledge and Data Engineering, 2024, 36 (11) : 1 - 14
  • [9] Learning Koopman Operators with Control Using Bi-level Optimization
    Huang, Daning
    Prasetyo, Muhammad Bayu
    Yu, Yin
    Geng, Junyi
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 2147 - 2152
  • [10] A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs
    Wang, Runzhong
    Hua, Zhigang
    Liu, Gan
    Zhang, Jiayi
    Yan, Junchi
    Qi, Feng
    Yang, Shuang
    Zhou, Jun
    Yang, Xiaokang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34