Learning in Bi-level Markov Games

被引：0

作者：

Meng, Linghui ^{[1
,2
]}

Ruan, Jingqing ^{[1
,2
]}

Xing, Dengpeng ^{[1
]}

Xu, Bo ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

来源：

2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2022年

关键词：

Reinforcement Learning; Multi-Agent System; Leader-Follower;

D O I：

10.1109/IJCNN55064.2022.9892747

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Although multi-agent reinforcement learning (MARL) has demonstrated remarkable progress in tackling sophisticated cooperative tasks, the assumption that agents take simultaneous actions still limits the applicability of MARL for many real-world problems. In this work, we relax the assumption by proposing the framework of the bi-level Markov game (BMG). BMG breaks the simultaneity by assigning two players with a leader-follower relationship in which the leader considers the policy of the follower who is taking the best response based on the leader's actions. We propose two provably convergent algorithms to solve BMG: BMG-1 and BMG-2. The former uses the standard Q-learning, while the latter relieves solving the local Stackelberg equilibrium in BMG-1 with the further two-step transition to estimate the state value. For both methods, we consider temporal difference learning techniques with both tabular and neural network representations. To verify the effectiveness of our BMG framework, we test on a series of games, including Seeker, Cooperative Navigation, and Football, that are challenging to existing MARL solvers find challenging to solve: Seeker, Cooperative Navigation, and Football. Experimental results show that our BMG methods achieve competitive advantages in terms of better performance and lower variance.

引用

页数：8

共 50 条

[31] On behalf of a bi-level account of trust
Carter, J. Adam
PHILOSOPHICAL STUDIES, 2020, 177 (08) : 2299 - 2322
[32] AUTOMATED IRRIGATION OF A BI-LEVEL GREENHOUSE
BOUKCHINA, R
LAGACE, R
THERIAULT, R
CANADIAN AGRICULTURAL ENGINEERING, 1993, 35 (04): : 237 - 244
[33] BI-LEVEL SUBSURFACE DRAINAGE THEORY
DEBOER, DW
CHU, ST
TRANSACTIONS OF THE ASAE, 1975, 18 (04): : 664 - 667
[34] Evolutionary multitasking in bi-level optimization
Gupta, Abhishek
Mandziuk, Jacek
Ong, Yew-Soon
COMPLEX & INTELLIGENT SYSTEMS, 2015, 1 (1-4) : 83 - 95
[35] BLIC: Bi-level isosurface compression
Taubin, G
VIS 2002: IEEE VISUALIZATION 2002, PROCEEDINGS, 2002, : 451 - 458
[36] Algorithm for bi-level linear programming
Beijing Univ of Aeronautics and, Astronautics, Beijing, China
Beijing Hangkong Hangtian Daxue Xuebao, 1 (78-83):
[37] A bi-level approach for the management of microgrids
Ferro, G.
Minciardi, R.
Delfino, F.
Rossi, M.
Robba, M.
IFAC PAPERSONLINE, 2018, 51 (28): : 309 - 314
[38] Bi-level digital video coding
Kim, D
Lee, I
Kim, M
Kim, WM
IEICE TRANSACTIONS ON COMMUNICATIONS, 2004, E87B (03) : 786 - 791
[39] Bi-Level Spectral Feature Selection
Hu, Zebiao
Wang, Jian
Zhang, Kai
Pedrycz, Witold
Pal, Nikhil R.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
[40] Solving bi-level linear programmes
White, DJ
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1996, 200 (01) : 254 - 258

← 1 2 3 4 5 →