Stable and Efficient Shapley Value-Based Reward Reallocation for Multi-Agent Reinforcement Learning of Autonomous Vehicles

被引：15

作者：

Han, Songyang ^{[1
]}

Wang, He ^{[2
]}

Su, Sanbao ^{[1
]}

Shi, Yuanyuan ^{[3
]}

Miao, Fei ^{[1
]}

机构：

[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT USA

[2] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China

[3] Univ Calif San Diego, Elect & Comp Engn Dept, San Diego, CA USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022 | 2022年

关键词：

GAME;

D O I：

10.1109/ICRA46639.2022.9811626

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the development of sensing and communication technologies in networked cyber-physical systems (CPSs), multi-agent reinforcement learning (MARL)-based methodologies are integrated into the control process of physical systems and demonstrate prominent performance in a wide array of CPS domains, such as connected autonomous vehicles (CAVs). However, it remains challenging to mathematically characterize the improvement of the performance of CAVs with communication and cooperation capability. When each individual autonomous vehicle is originally self-interest, we can not assume that all agents would cooperate naturally during the training process. In this work, we propose to reallocate the system's total reward efficiently to motivate stable cooperation among autonomous vehicles. We formally define and quantify how to reallocate the system's total reward to each agent under the proposed transferable utility game, such that communication-based cooperation among multi-agents increases the system's total reward. We prove that Shapley value-based reward reallocation of MARL locates in the core if the transferable utility game is a convex game. Hence, the cooperation is stable and efficient and the agents should stay in the coalition or the cooperating group. We then propose a cooperative policy learning algorithm with Shapley value reward reallocation. In experiments, compared with several literature algorithms, we show the improvement of the mean episode system reward of CAV systems using our proposed algorithm.

引用

页码：8765 / 8771

页数：7

共 50 条

[1] Multi-agent reinforcement learning for autonomous vehicles: a survey
Dinneweth J.
Boubezoul A.
Mandiau R.
Espié S.
Autonomous Intelligent Systems, 2022, 2 (01):
[2] Multi-Agent Reinforcement Learning for Autonomous On Demand Vehicles
Boyali, Ali
Hashimoto, Naohisa
John, Vijay
Acarman, Tankut
2019 30TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV19), 2019, : 1461 - 1468
[3] Autonomous learning of reward distribution for each agent in multi-agent reinforcement learning
Shibata, K
Ito, K
INTELLIGENT AUTONOMOUS SYSTEMS 6, 2000, : 495 - 502
[4] A study on multi-agent reinforcement learning for autonomous distribution vehicles
Serap Ergün
Iran Journal of Computer Science, 2023, 6 (4) : 297 - 305
[5] Rethinking Exploration and Experience Exploitation in Value-Based Multi-Agent Reinforcement Learning
Borzilov, Anatolii
Skrynnik, Alexey
Panov, Aleksandr
IEEE ACCESS, 2025, 13 : 13770 - 13781
[6] A Multi-Agent Reinforcement Learning Approach for Safe and Efficient Behavior Planning of Connected Autonomous Vehicles
Han, Songyang
Zhou, Shanglin
Wang, Jiangwei
Pepin, Lynn
Ding, Caiwen
Fu, Jie
Miao, Fei
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (05) : 3654 - 3670
[7] Multi-Agent Reinforcement Learning for Traffic Flow Management of Autonomous Vehicles
Mushtaq, Anum
Ul Haq, Irfan
Sarwar, Muhammad Azeem
Khan, Asifullah
Khalil, Wajeeha
Mughal, Muhammad Abid
SENSORS, 2023, 23 (05)
[8] Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning
Li, Jiahui
Kuang, Kun
Wang, Baoxiang
Liu, Furui
Chen, Long
Wu, Fei
Xiao, Jun
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 934 - 942
[9] Multi-Agent Reinforcement Learning with Reward Delays
Zhang, Yuyang
Zhang, Runyu
Gu, Yuantao
Li, Na
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
[10] Correcting biased value estimation in mixing value-based multi-agent reinforcement learning by multiple choice learning
Liu, Bing
Xie, Yuxuan
Feng, Lei
Fu, Ping
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 116

← 1 2 3 4 5 →