Stable and Efficient Shapley Value-Based Reward Reallocation for Multi-Agent Reinforcement Learning of Autonomous Vehicles

被引：15

作者：

Han, Songyang ^{[1
]}

Wang, He ^{[2
]}

Su, Sanbao ^{[1
]}

Shi, Yuanyuan ^{[3
]}

Miao, Fei ^{[1
]}

机构：

[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT USA

[2] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China

[3] Univ Calif San Diego, Elect & Comp Engn Dept, San Diego, CA USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022 | 2022年

关键词：

GAME;

D O I：

10.1109/ICRA46639.2022.9811626

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the development of sensing and communication technologies in networked cyber-physical systems (CPSs), multi-agent reinforcement learning (MARL)-based methodologies are integrated into the control process of physical systems and demonstrate prominent performance in a wide array of CPS domains, such as connected autonomous vehicles (CAVs). However, it remains challenging to mathematically characterize the improvement of the performance of CAVs with communication and cooperation capability. When each individual autonomous vehicle is originally self-interest, we can not assume that all agents would cooperate naturally during the training process. In this work, we propose to reallocate the system's total reward efficiently to motivate stable cooperation among autonomous vehicles. We formally define and quantify how to reallocate the system's total reward to each agent under the proposed transferable utility game, such that communication-based cooperation among multi-agents increases the system's total reward. We prove that Shapley value-based reward reallocation of MARL locates in the core if the transferable utility game is a convex game. Hence, the cooperation is stable and efficient and the agents should stay in the coalition or the cooperating group. We then propose a cooperative policy learning algorithm with Shapley value reward reallocation. In experiments, compared with several literature algorithms, we show the improvement of the mean episode system reward of CAV systems using our proposed algorithm.

引用

页码：8765 / 8771

页数：7

共 50 条

[41] Multi-agent reinforcement learning for cooperative lane changing of connected and autonomous vehicles in mixed traffic
Zhou W.
Chen D.
Yan J.
Li Z.
Yin H.
Ge W.
Autonomous Intelligent Systems, 2022, 2 (01):
[42] Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning
Mannion, Patrick
Devlin, Sam
Duggan, Jim
Howley, Enda
KNOWLEDGE ENGINEERING REVIEW, 2018, 33
[43] Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning
Li, Jiahui
Kuang, Kun
Wang, Baoxiang
Liu, Furui
Chen, Long
Fan, Changjie
Wu, Fei
Xiao, Jun
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[44] Quantum Multi-Agent Reinforcement Learning for Autonomous Mobility Cooperation
Park, Soohyun
Kim, Jae Pyoung
Park, Chanyoung
Jung, Soyi
Kim, Joongheon
IEEE COMMUNICATIONS MAGAZINE, 2024, 62 (06) : 106 - 112
[45] Impact of Relational Networks in Multi-Agent Learning: A Value-Based Factorization View
Findik, Yasin
Robinette, Paul
Jerath, Kshitij
Ahmadzadeh, S. Reza
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 4447 - 4454
[46] Autonomous Separation Assurance with Deep Multi-Agent Reinforcement Learning
Brittain, Marc W.
Yang, Xuxi
Wei, Peng
JOURNAL OF AEROSPACE INFORMATION SYSTEMS, 2021, 18 (12): : 890 - 905
[47] Cooperative price-based demand response program for multiple aggregators based on multi-agent reinforcement learning and Shapley-value
Fraija, Alejandro
Henao, Nilson
Agbossou, Kodjo
Kelouwani, Sousso
Fournier, Michael
SUSTAINABLE ENERGY GRIDS & NETWORKS, 2024, 40
[48] Decentralized graph-based multi-agent reinforcement learning using reward machines
Hu, Jueming
Xu, Zhe
Wang, Weichang
Qu, Guannan
Pang, Yutian
Liu, Yongming
NEUROCOMPUTING, 2024, 564
[49] Multi-agent reinforcement learning based on self-satisfaction in sparse reward scenarios
Fang, Baofu
Tang, Dandan
Wang, Zaijun
Wang, Hao
INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2025, 25 (01)
[50] Multi-agent Reinforcement Learning Based on Adaptive State Approximation in Sparse Reward Scenarios
Fang, Baofu
Yu, Tingting
Wang, Hao
Wang, Zaijun
Jiqiren/Robot, 2024, 46 (06): : 663 - 671

← 1 2 3 4 5 →