Stable and Efficient Shapley Value-Based Reward Reallocation for Multi-Agent Reinforcement Learning of Autonomous Vehicles

被引:15
|
作者
Han, Songyang [1 ]
Wang, He [2 ]
Su, Sanbao [1 ]
Shi, Yuanyuan [3 ]
Miao, Fei [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT USA
[2] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China
[3] Univ Calif San Diego, Elect & Comp Engn Dept, San Diego, CA USA
关键词
GAME;
D O I
10.1109/ICRA46639.2022.9811626
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the development of sensing and communication technologies in networked cyber-physical systems (CPSs), multi-agent reinforcement learning (MARL)-based methodologies are integrated into the control process of physical systems and demonstrate prominent performance in a wide array of CPS domains, such as connected autonomous vehicles (CAVs). However, it remains challenging to mathematically characterize the improvement of the performance of CAVs with communication and cooperation capability. When each individual autonomous vehicle is originally self-interest, we can not assume that all agents would cooperate naturally during the training process. In this work, we propose to reallocate the system's total reward efficiently to motivate stable cooperation among autonomous vehicles. We formally define and quantify how to reallocate the system's total reward to each agent under the proposed transferable utility game, such that communication-based cooperation among multi-agents increases the system's total reward. We prove that Shapley value-based reward reallocation of MARL locates in the core if the transferable utility game is a convex game. Hence, the cooperation is stable and efficient and the agents should stay in the coalition or the cooperating group. We then propose a cooperative policy learning algorithm with Shapley value reward reallocation. In experiments, compared with several literature algorithms, we show the improvement of the mean episode system reward of CAV systems using our proposed algorithm.
引用
收藏
页码:8765 / 8771
页数:7
相关论文
共 50 条
  • [41] Multi-agent reinforcement learning for cooperative lane changing of connected and autonomous vehicles in mixed traffic
    Zhou W.
    Chen D.
    Yan J.
    Li Z.
    Yin H.
    Ge W.
    Autonomous Intelligent Systems, 2022, 2 (01):
  • [42] Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning
    Mannion, Patrick
    Devlin, Sam
    Duggan, Jim
    Howley, Enda
    KNOWLEDGE ENGINEERING REVIEW, 2018, 33
  • [43] Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning
    Li, Jiahui
    Kuang, Kun
    Wang, Baoxiang
    Liu, Furui
    Chen, Long
    Fan, Changjie
    Wu, Fei
    Xiao, Jun
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [44] Quantum Multi-Agent Reinforcement Learning for Autonomous Mobility Cooperation
    Park, Soohyun
    Kim, Jae Pyoung
    Park, Chanyoung
    Jung, Soyi
    Kim, Joongheon
    IEEE COMMUNICATIONS MAGAZINE, 2024, 62 (06) : 106 - 112
  • [45] Impact of Relational Networks in Multi-Agent Learning: A Value-Based Factorization View
    Findik, Yasin
    Robinette, Paul
    Jerath, Kshitij
    Ahmadzadeh, S. Reza
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 4447 - 4454
  • [46] Autonomous Separation Assurance with Deep Multi-Agent Reinforcement Learning
    Brittain, Marc W.
    Yang, Xuxi
    Wei, Peng
    JOURNAL OF AEROSPACE INFORMATION SYSTEMS, 2021, 18 (12): : 890 - 905
  • [47] Cooperative price-based demand response program for multiple aggregators based on multi-agent reinforcement learning and Shapley-value
    Fraija, Alejandro
    Henao, Nilson
    Agbossou, Kodjo
    Kelouwani, Sousso
    Fournier, Michael
    SUSTAINABLE ENERGY GRIDS & NETWORKS, 2024, 40
  • [48] Decentralized graph-based multi-agent reinforcement learning using reward machines
    Hu, Jueming
    Xu, Zhe
    Wang, Weichang
    Qu, Guannan
    Pang, Yutian
    Liu, Yongming
    NEUROCOMPUTING, 2024, 564
  • [49] Multi-agent reinforcement learning based on self-satisfaction in sparse reward scenarios
    Fang, Baofu
    Tang, Dandan
    Wang, Zaijun
    Wang, Hao
    INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2025, 25 (01)
  • [50] Multi-agent Reinforcement Learning Based on Adaptive State Approximation in Sparse Reward Scenarios
    Fang, Baofu
    Yu, Tingting
    Wang, Hao
    Wang, Zaijun
    Jiqiren/Robot, 2024, 46 (06): : 663 - 671