STAS: Spatial-Temporal Return Decomposition for Solving Sparse Rewards Problems in Multi-agent Reinforcement Learning

被引:0
|
作者
Chen, Sirui [1 ]
Zhang, Zhaowei [2 ,4 ]
Yang, Yaodong [2 ]
Du, Yali [3 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[2] Peking Univ, Inst Artificial Intelligence, Beijing, Peoples R China
[3] Kings Coll London, London, England
[4] BIGAI, Natl Key Lab Gen Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Centralized Training with Decentralized Execution (CTDE) has been proven to be an effective paradigm in cooperative multi-agent reinforcement learning (MARL). One of the major challenges is credit assignment, which aims to credit agents by their contributions. While prior studies have shown great success, their methods typically fail to work in episodic reinforcement learning scenarios where global rewards are revealed only at the end of the episode. They lack the functionality to model complicated relations of the delayed global reward in the temporal dimension and suffer from inefficiencies. To tackle this, we introduce Spatial-Temporal Attention with Shapley (STAS), a novel method that learns credit assignment in both temporal and spatial dimensions. It first decomposes the global return back to each time step, then utilizes the Shapley Value to redistribute the individual payoff from the decomposed global reward. To mitigate the computational complexity of the Shapley Value, we introduce an approximation of marginal contribution and utilize Monte Carlo sampling to estimate it. We evaluate our method on an Alice & Bob example and MPE environments across different scenarios. Our results demonstrate that our method effectively assigns spatial-temporal credit, outperforming all state-of-the-art baselines.
引用
收藏
页码:17337 / 17345
页数:9
相关论文
共 50 条
  • [1] Spatial-Temporal Traffic Flow Control on Motorways Using Distributed Multi-Agent Reinforcement Learning
    Kusic, Kresimir
    Ivanjko, Edouard
    Vrbanic, Filip
    Greguric, Martin
    Dusparic, Ivana
    [J]. MATHEMATICS, 2021, 9 (23)
  • [2] Hierarchical multi-agent reinforcement learning for cooperative tasks with sparse rewards in continuous domain
    Jingyu Cao
    Lu Dong
    Xin Yuan
    Yuanda Wang
    Changyin Sun
    [J]. Neural Computing and Applications, 2024, 36 : 273 - 287
  • [3] Hierarchical multi-agent reinforcement learning for cooperative tasks with sparse rewards in continuous domain
    Cao, Jingyu
    Dong, Lu
    Yuan, Xin
    Wang, Yuanda
    Sun, Changyin
    [J]. NEURAL COMPUTING & APPLICATIONS, 2024, 36 (01): : 273 - 287
  • [4] Spatial-Temporal Graph Attention-based Multi-Agent Reinforcement Learning in Cooperative Edge Caching
    Hou, Jiacheng
    Nayak, Amiya
    [J]. ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 3078 - 3083
  • [5] Spatial-Temporal Aligned Multi-Agent Learning for Visual Dialog Systems
    Zhuang, Yong
    Yu, Tong
    Wu, Junda
    Wu, Shiqu
    Li, Shuai
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [6] SparseMAAC: Sparse Attention for Multi-agent Reinforcement Learning
    Li, Wenhao
    Jin, Bo
    Wang, Xiangfeng
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 96 - 110
  • [7] Multi-agent Reinforcement Learning in Spatial Domain Tasks using Inter Subtask Empowerment Rewards
    Pateria, Shubham
    Subagdja, Budhitama
    Tan, Ah-Hwee
    [J]. 2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 86 - 93
  • [8] GMIX: Graph-based spatial-temporal multi-agent reinforcement learning for dynamic electric vehicle dispatching system
    Zhou, Tao
    Kris, M. Y. Law
    Creighton, Douglas
    Wu, Changzhi
    [J]. TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2022, 144
  • [9] Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning
    Li, Jiahui
    Kuang, Kun
    Wang, Baoxiang
    Liu, Furui
    Chen, Long
    Fan, Changjie
    Wu, Fei
    Xiao, Jun
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Solving job scheduling problems in a resource preemption environment with multi-agent reinforcement learning
    Wang, Xiaohan
    Zhang, Lin
    Lin, Tingyu
    Zhao, Chun
    Wang, Kunyu
    Chen, Zhen
    [J]. ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2022, 77