MuDE: Multi-agent decomposed reward-based exploration

被引:0
|
作者
Yoo, Byunghyun [1 ]
Yi, Sungwon [1 ]
Kim, Hyunwoo [1 ]
Shin, Younghwan [1 ]
Han, Ran [1 ]
Seo, Seungwoo [1 ]
Song, Hwa Jeon [1 ]
Chung, Euisok [1 ]
Yang, Jeongmin [1 ]
机构
[1] Elect & Telecommun Res Inst ETRI, 218 Gajeong Ro, Daejeon 34129, South Korea
关键词
Multi-agent reinforcement learning; Exploration; Reward decomposition;
D O I
10.1016/j.neunet.2024.106565
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In cooperative multi-agent reinforcement learning, agents jointly optimize a centralized value function based on the rewards shared by all agents and learn decentralized policies through value function decomposition. Although such a learning framework is considered effective, estimating individual contribution from the rewards, which is essential for learning highly cooperative behaviors, is difficult. In addition, it becomes more challenging when reinforcement and punishment, help in increasing or decreasing the specific behaviors of agents, coexist because the processes of maximizing reinforcement and minimizing punishment can often conflict in practice. This study proposes a novel exploration scheme called multi-agent decomposed reward- based exploration (MuDE), which preferably explores the action spaces associated with positive sub-rewards based on a modified reward decomposition scheme, thus effectively exploring action spaces not reachable by existing exploration schemes. We evaluate MuDE with a challenging set of StarCraft II micromanagement and modified predator-prey tasks extended to include reinforcement and punishment. The results show that MuDE accurately estimates sub-rewards and outperforms state-of-the-art approaches in both convergence speed and win rates.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Multi-agent Reward-Based Intruder Capture
    Grimaldi, Michele
    Herpson, Cedric
    [J]. INTELLIGENT DISTRIBUTED COMPUTING XVI, IDC 2023, 2024, 1138 : 251 - 266
  • [2] Probabilistic Reward-Based Reinforcement Learning for Multi-Agent Pursuit and Evasion
    Zhang, Bo-Kun
    Hu, Bin
    Chen, Long
    Zhang, Ding-Xue
    Cheng, Xin-Ming
    Guan, Zhi-Hong
    [J]. PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 3352 - 3357
  • [3] Reward-based epigenetic learning algorithm for a decentralised multi-agent system
    Mukhlish, Faqihza
    Page, John
    Bain, Michael
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT UNMANNED SYSTEMS, 2020, 8 (03) : 201 - 224
  • [4] Reward-Based Negotiating Agent Strategies
    Higa, Ryota
    Fujita, Katsuhide
    Takahashi, Toki
    Shimizu, Takumu
    Nakadai, Shinji
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10, 2023, : 11569 - 11577
  • [5] Quantifying exploration in reward-based motor learning
    van Mastrigt, Nina M.
    Smeets, Jeroen B. J.
    van der Kooij, Katinka
    [J]. PLOS ONE, 2020, 15 (04):
  • [6] Information-based multi-agent exploration
    Baglietto, M
    Paolucci, M
    Scardovi, L
    Zoppoli, R
    [J]. ROMOCO'02: PROCEEDINGS OF THE THIRD INTERNATIONAL WORKSHOP ON ROBOT MOTION AND CONTROL, 2002, : 173 - 179
  • [7] Dynamic Agent-Based Reward Shaping for Multi-Agent Systems
    Sadeghlou, Maryam
    Akbarzadeh-T, Mohammad Reza
    Naghibi-S, Mohammad Bagher
    [J]. 2014 IRANIAN CONFERENCE ON INTELLIGENT SYSTEMS (ICIS), 2014,
  • [8] Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks
    Xu, Pei
    Zhang, Junge
    Yin, Qiyue
    Yu, Chao
    Yang, Yaodong
    Huang, Kaiqi
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10, 2023, : 11717 - 11725
  • [9] Multi-Agent Maze Exploration
    Kivelevitch, Elad H.
    Cohen, Kelly
    [J]. JOURNAL OF AEROSPACE COMPUTING INFORMATION AND COMMUNICATION, 2010, 7 (12): : 391 - 405
  • [10] A Design of Reward Function Based on Knowledge in Multi-agent Learning
    Fan, Bo
    Pu, Jiexin
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2008, 5139 : 596 - 603