Hierarchical Coordination Multi-Agent Reinforcement Learning With Spatio-Temporal Abstraction

被引:2
|
作者
Ma, Tinghuai [1 ]
Peng, Kexing [2 ]
Rong, Huan [3 ]
Qian, Yurong [4 ]
Al-Nabhan, Najla [5 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Sch Software, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Sch Comp Sci, Nanjing 210044, Jiangsu, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Sch Artificial Intelligence, Nanjing, Jiangsu, Peoples R China
[4] Xinjiang Univ, Urumqi, Peoples R China
[5] King Saud Univ, Dept Comp Sci, Riyadh 11362, Saudi Arabia
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Training; Task analysis; Games; Multi-agent systems; Heuristic algorithms; Computational intelligence; Multi-agent reinforcement learning; spatio-temporal; hierarchical reinforcement learning; communications;
D O I
10.1109/TETCI.2023.3309738
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many real-world cooperative problems can be implemented using Multi-Agent Reinforcement Learning (MARL) techniques, such as urban traffic control or multi-role games. However, the policy learning of MARL algorithms contains features of long-trajectory training and partial observability, which leads to the sparsity of reward and the lack of decision information. To solve the above issues, this article studies hierarchical deep MARL and proposes a novel model named Hierarchical Spatio-Temporal Communication Network (HSTCN). HSTCN designs hierarchical policies with two-time granularities: high-level and low-level policies. All agents are jointly entered into a joint policy containing the above two policies, and each has its execution policy. Specifically, the high-level policy provides intrinsic goals and continuous reward samples for the low-level policy to alleviate reward sparsity. The Low-level policy absorbs the above information to improve the efficiency of the agents' execution policies and interact with the environment to optimize the next reward. What's more, the high-level policy designs a graph-like structural model with Spatio-Temporal abstract. The Spatio-Temporal model expands receptive fields to receive neighborhood information and facilitates learning more robust policies by capturing the underlying graph's spatial dependencies and temporal dynamics. Meanwhile, an evaluation network is added to increase the robustness. Empirically, we demonstrated the effectiveness of HSTCN in a long-trajectory training environment through Simulation of Urban MObility (SUMO), while StarCraft II maps are tested as abstract environment. The above experimental results prove that the performance of HSTCN is superior to other advanced algorithms and verify the rationality of HSTCN design.
引用
收藏
页码:533 / 547
页数:15
相关论文
共 50 条
  • [1] STMARL: A Spatio-Temporal Multi-Agent Reinforcement Learning Approach for Cooperative Traffic Light Control
    Wang, Yanan
    Xu, Tong
    Niu, Xin
    Tan, Chang
    Chen, Enhong
    Xiong, Hui
    [J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2022, 21 (06) : 2228 - 2242
  • [2] Multi-agent Deep Reinforcement Learning with Spatio-Temporal Feature Fusion for Traffic Signal Control
    Du, Xin
    Wang, Jiahai
    Chen, Siyuan
    Liu, Zhiyue
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: APPLIED DATA SCIENCE TRACK, PT IV, 2021, 12978 : 470 - 485
  • [3] Hierarchical multi-agent reinforcement learning
    Mohammad Ghavamzadeh
    Sridhar Mahadevan
    Rajbala Makar
    [J]. Autonomous Agents and Multi-Agent Systems, 2006, 13 : 197 - 229
  • [4] Hierarchical multi-agent reinforcement learning
    Ghavamzadeh, Mohammad
    Mahadevan, Sridhar
    Makar, Rajbala
    [J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2006, 13 (02) : 197 - 229
  • [5] Target-Oriented Multi-Agent Coordination with Hierarchical Reinforcement Learning
    Yu, Yuekang
    Zhai, Zhongyi
    Li, Weikun
    Ma, Jianyu
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (16):
  • [6] HAVEN: Hierarchical Cooperative Multi-Agent Reinforcement Learning with Dual Coordination Mechanism
    Xu, Zhiwei
    Bai, Yunpeng
    Zhang, Bin
    Li, Dapeng
    Fan, Guoliang
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10, 2023, : 11735 - 11743
  • [7] Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential Decision-Making in Multi-Agent Reinforcement Learning
    Zhang, Bin
    Li, Lijuan
    Xu, Zhiwei
    Li, Dapeng
    Fan, Guoliang
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 353 - 361
  • [8] Coordination as inference in multi-agent reinforcement learning
    Li, Zhiyuan
    Wu, Lijun
    Su, Kaile
    Wu, Wei
    Jing, Yulin
    Wu, Tong
    Duan, Weiwei
    Yue, Xiaofeng
    Tong, Xiyi
    Han, Yizhou
    [J]. NEURAL NETWORKS, 2024, 172
  • [9] UTILIZING SPATIO-TEMPORAL DATA IN MULTI-AGENT SIMULATION
    Glake, Daniel
    Ritter, Norbert
    Clemen, Thomas
    [J]. 2020 WINTER SIMULATION CONFERENCE (WSC), 2020, : 242 - 253
  • [10] Multi-Agent Simulation of Epidemic Spatio-temporal Transmission
    Liu, Tao
    Li, Xia
    Ai, Bin
    Fu, Jing
    Zhang, XiaoHu
    [J]. ICNC 2008: FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 7, PROCEEDINGS, 2008, : 357 - +