Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure

被引：0

作者：

Kao, Hsu ^{[1
]}

Wei, Chen-Yu ^{[2
]}

Subramanian, Vijay ^{[1
]}

机构：

[1] Univ Michigan, Ann Arbor, MI 48109 USA

[2] Univ Southern Calif, Los Angeles, CA 90007 USA

来源：

INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167 | 2022年 / 167卷

关键词：

hierarchical information structure; multi-agent online learning; multi-armed bandit; Markov decision process; REGRET;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-agent reinforcement learning (MARL) problems are challenging due to information asymmetry. To overcome this challenge, existing methods often require high level of coordination or communication between the agents. We consider two-agent multi-armed bandits (MABs) and Markov decision processes (MDPs) with a hierarchical information structure arising in applications, which we exploit to propose simpler and more efficient algorithms that require no coordination or communication. In the structure, in each step the "leader" chooses her action first, and then the "follower" decides his action after observing the leader's action. The two agents observe the same reward (and the same state transition in the MDP setting) that depends on their joint action. For the bandit setting, we propose a hierarchical bandit algorithm that achieves a near-optimal gap-independent regret of (O) over tilde(root ABT) and a near-optimal gap-dependent regret of O(log(T)), where A and B are the numbers of actions of the leader and the follower, respectively, and T is the number of steps. We further extend to the case of multiple followers and the case with a deep hierarchy, where we both obtain near-optimal regret bounds. For the MDP setting, we obtain (O) over tilde(root(HS2)-S-7 ABT) regret, where H is the number of steps per episode, S is the number of states, T is the number of episodes. This matches the existing lower bound in terms of A, B, and T.

引用

页数：33

共 50 条

[1] Deep Decentralized Reinforcement Learning for Cooperative Control
Koepf, Florian
Tesfazgi, Samuel
Flad, Michael
Hohmann, Soeren
[J]. IFAC PAPERSONLINE, 2020, 53 (02): : 1555 - 1562
[2] Decentralized Scheduling for Cooperative Localization With Deep Reinforcement Learning
Peng, Bile
Seco-Granados, Gonzalo
Steinmetz, Erik
Frohle, Markus
Wymeersch, Henk
[J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (05) : 4295 - 4305
[3] Cooperative Multi-Robot Hierarchical Reinforcement Learning
Setyawan, Gembong Edhi
Hartono, Pitoyo
Sawada, Hideyuki
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (09) : 35 - 44
[4] Learning Fair Policies in Decentralized Cooperative Multi-Agent Reinforcement Learning
Zimmer, Matthieu
Glanois, Claire
Siddique, Umer
Weng, Paul
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[5] Adaptive Learning: A New Decentralized Reinforcement Learning Approach for Cooperative Multiagent Systems
Li, Meng-Lin
Chen, Shaofei
Chen, Jing
[J]. IEEE ACCESS, 2020, 8 : 99404 - 99421
[6] Decentralized control and local information for robust and adaptive decentralized Deep Reinforcement Learning
Schilling, Malte
Melnik, Andrew
Ohl, Frank W.
Ritter, Helge J.
Hammer, Barbara
[J]. NEURAL NETWORKS, 2021, 144 : 699 - 725
[7] Inverse Reinforcement Learning for Decentralized Non-Cooperative Multiagent Systems
Reddy, Tummalapalli Sudhamsh
Gopikrishna, Vamsikrishna
Zaruba, Gergely
Huber, Manfred
[J]. PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 1930 - 1935
[8] Decentralized Anomaly Detection in Cooperative Multi-Agent Reinforcement Learning
Kazari, Kiarash
Shereen, Ezzeldin
Dan, Gyorgy
[J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 162 - 170
[9] An Approach to Hierarchical Deep Reinforcement Learning for a Decentralized Walking Control Architecture
Schilling, Malte
Melnik, Andrew
[J]. BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES 2018, 2019, 848 : 272 - 282
[10] Hierarchical and Stable Multiagent Reinforcement Learning for Cooperative Navigation Control
Jin, Yue
Wei, Shuangqing
Yuan, Jian
Zhang, Xudong
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (01) : 90 - 103

← 1 2 3 4 5 →