Hierarchical Coordination Multi-Agent Reinforcement Learning With Spatio-Temporal Abstraction

被引:2
|
作者
Ma, Tinghuai [1 ]
Peng, Kexing [2 ]
Rong, Huan [3 ]
Qian, Yurong [4 ]
Al-Nabhan, Najla [5 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Sch Software, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Sch Comp Sci, Nanjing 210044, Jiangsu, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Sch Artificial Intelligence, Nanjing, Jiangsu, Peoples R China
[4] Xinjiang Univ, Urumqi, Peoples R China
[5] King Saud Univ, Dept Comp Sci, Riyadh 11362, Saudi Arabia
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Training; Task analysis; Games; Multi-agent systems; Heuristic algorithms; Computational intelligence; Multi-agent reinforcement learning; spatio-temporal; hierarchical reinforcement learning; communications;
D O I
10.1109/TETCI.2023.3309738
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many real-world cooperative problems can be implemented using Multi-Agent Reinforcement Learning (MARL) techniques, such as urban traffic control or multi-role games. However, the policy learning of MARL algorithms contains features of long-trajectory training and partial observability, which leads to the sparsity of reward and the lack of decision information. To solve the above issues, this article studies hierarchical deep MARL and proposes a novel model named Hierarchical Spatio-Temporal Communication Network (HSTCN). HSTCN designs hierarchical policies with two-time granularities: high-level and low-level policies. All agents are jointly entered into a joint policy containing the above two policies, and each has its execution policy. Specifically, the high-level policy provides intrinsic goals and continuous reward samples for the low-level policy to alleviate reward sparsity. The Low-level policy absorbs the above information to improve the efficiency of the agents' execution policies and interact with the environment to optimize the next reward. What's more, the high-level policy designs a graph-like structural model with Spatio-Temporal abstract. The Spatio-Temporal model expands receptive fields to receive neighborhood information and facilitates learning more robust policies by capturing the underlying graph's spatial dependencies and temporal dynamics. Meanwhile, an evaluation network is added to increase the robustness. Empirically, we demonstrated the effectiveness of HSTCN in a long-trajectory training environment through Simulation of Urban MObility (SUMO), while StarCraft II maps are tested as abstract environment. The above experimental results prove that the performance of HSTCN is superior to other advanced algorithms and verify the rationality of HSTCN design.
引用
收藏
页码:533 / 547
页数:15
相关论文
共 50 条
  • [41] Fault diagnosis and protection strategy based on spatio-temporal multi-agent reinforcement learning for active distribution system using phasor measurement units
    Zhang, Tong
    Liu, Jianchang
    Wang, Honghai
    Li, Yong
    Wang, Nan
    Kang, Chengming
    [J]. MEASUREMENT, 2023, 220
  • [42] Learning coordination in multi-agent systems using influence value reinforcement learning
    Barrios-Aranibar, Dennis
    Garcia Goncalves, Luiz Marcos
    [J]. PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, : 471 - 476
  • [43] Balancing Spectral Clustering for Segmenting Spatio-Temporal Observations of Multi-Agent Systems
    Takacs, Balint
    Demiris, Yiannis
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 580 - 587
  • [44] A spatio-temporal logic for 2D multi-agent problem domains
    Gagne, D
    Pang, WL
    Trudel, A
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 1997, 12 (01) : 141 - 145
  • [45] Hierarchical Reinforcement Learning with Opponent Modeling for Distributed Multi-agent Cooperation
    Liang, Zhixuan
    Cao, Jiannong
    Jiang, Shan
    Saxena, Divya
    Xu, Huafeng
    [J]. 2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2022), 2022, : 884 - 894
  • [46] Mitigating Bus Bunching via Hierarchical Multi-Agent Reinforcement Learning
    Yu, Mengdi
    Yang, Tao
    Li, Chunxiao
    Jin, Yaohui
    Xu, Yanyan
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (08) : 9675 - 9692
  • [47] Hierarchical reinforcement learning based on multi-agent cooperation game theory
    Tang H.
    Dong C.
    [J]. International Journal of Wireless and Mobile Computing, 2019, 16 (04): : 369 - 376
  • [48] AHAC: Actor Hierarchical Attention Critic for Multi-Agent Reinforcement Learning
    Wang, Yajie
    Shi, Dianxi
    Xue, Chao
    Jiang, Hao
    Wang, Gongju
    Gong, Peng
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 3013 - 3020
  • [49] Hierarchical Control of Multi-Agent Systems using Online Reinforcement Learning
    Bai, He
    George, Jemin
    Chakrabortty, Aranya
    [J]. 2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 340 - 345
  • [50] Hierarchical graph multi-agent reinforcement learning for traffic signal control
    Yang, Shantian
    [J]. INFORMATION SCIENCES, 2023, 634 : 55 - 72