Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure

被引:0
|
作者
Kao, Hsu [1 ]
Wei, Chen-Yu [2 ]
Subramanian, Vijay [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Univ Southern Calif, Los Angeles, CA 90007 USA
关键词
hierarchical information structure; multi-agent online learning; multi-armed bandit; Markov decision process; REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-agent reinforcement learning (MARL) problems are challenging due to information asymmetry. To overcome this challenge, existing methods often require high level of coordination or communication between the agents. We consider two-agent multi-armed bandits (MABs) and Markov decision processes (MDPs) with a hierarchical information structure arising in applications, which we exploit to propose simpler and more efficient algorithms that require no coordination or communication. In the structure, in each step the "leader" chooses her action first, and then the "follower" decides his action after observing the leader's action. The two agents observe the same reward (and the same state transition in the MDP setting) that depends on their joint action. For the bandit setting, we propose a hierarchical bandit algorithm that achieves a near-optimal gap-independent regret of (O) over tilde(root ABT) and a near-optimal gap-dependent regret of O(log(T)), where A and B are the numbers of actions of the leader and the follower, respectively, and T is the number of steps. We further extend to the case of multiple followers and the case with a deep hierarchy, where we both obtain near-optimal regret bounds. For the MDP setting, we obtain (O) over tilde(root(HS2)-S-7 ABT) regret, where H is the number of steps per episode, S is the number of states, T is the number of episodes. This matches the existing lower bound in terms of A, B, and T.
引用
收藏
页数:33
相关论文
共 50 条
  • [1] Deep Decentralized Reinforcement Learning for Cooperative Control
    Koepf, Florian
    Tesfazgi, Samuel
    Flad, Michael
    Hohmann, Soeren
    [J]. IFAC PAPERSONLINE, 2020, 53 (02): : 1555 - 1562
  • [2] Decentralized Scheduling for Cooperative Localization With Deep Reinforcement Learning
    Peng, Bile
    Seco-Granados, Gonzalo
    Steinmetz, Erik
    Frohle, Markus
    Wymeersch, Henk
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (05) : 4295 - 4305
  • [3] Cooperative Multi-Robot Hierarchical Reinforcement Learning
    Setyawan, Gembong Edhi
    Hartono, Pitoyo
    Sawada, Hideyuki
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (09) : 35 - 44
  • [4] Learning Fair Policies in Decentralized Cooperative Multi-Agent Reinforcement Learning
    Zimmer, Matthieu
    Glanois, Claire
    Siddique, Umer
    Weng, Paul
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Adaptive Learning: A New Decentralized Reinforcement Learning Approach for Cooperative Multiagent Systems
    Li, Meng-Lin
    Chen, Shaofei
    Chen, Jing
    [J]. IEEE ACCESS, 2020, 8 : 99404 - 99421
  • [6] Decentralized control and local information for robust and adaptive decentralized Deep Reinforcement Learning
    Schilling, Malte
    Melnik, Andrew
    Ohl, Frank W.
    Ritter, Helge J.
    Hammer, Barbara
    [J]. NEURAL NETWORKS, 2021, 144 : 699 - 725
  • [7] Inverse Reinforcement Learning for Decentralized Non-Cooperative Multiagent Systems
    Reddy, Tummalapalli Sudhamsh
    Gopikrishna, Vamsikrishna
    Zaruba, Gergely
    Huber, Manfred
    [J]. PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 1930 - 1935
  • [8] Decentralized Anomaly Detection in Cooperative Multi-Agent Reinforcement Learning
    Kazari, Kiarash
    Shereen, Ezzeldin
    Dan, Gyorgy
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 162 - 170
  • [9] An Approach to Hierarchical Deep Reinforcement Learning for a Decentralized Walking Control Architecture
    Schilling, Malte
    Melnik, Andrew
    [J]. BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES 2018, 2019, 848 : 272 - 282
  • [10] Hierarchical and Stable Multiagent Reinforcement Learning for Cooperative Navigation Control
    Jin, Yue
    Wei, Shuangqing
    Yuan, Jian
    Zhang, Xudong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (01) : 90 - 103