Optimistic sequential multi-agent reinforcement learning with motivational communication

被引：0

作者：

Huang, Anqi ^{[1
]}

Wang, Yongli ^{[1
]}

Zhou, Xiaoliang ^{[1
]}

Zou, Haochen ^{[1
]}

Dong, Xu ^{[1
]}

Che, Xun ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

来源：

NEURAL NETWORKS | 2024年 / 179卷

基金：

中国国家自然科学基金;

关键词：

Multi-agent reinforcement learning; Policy gradient; Motivational communication; Reinforcement learning; Multi-agent system;

D O I：

10.1016/j.neunet.2024.106547

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Centralized Training with Decentralized Execution (CTDE) is a prevalent paradigm in the field of fully cooperative Multi-Agent Reinforcement Learning (MARL). Existing algorithms often encounter two major problems: independent strategies tend to underestimate the potential value of actions, leading to the convergence on sub-optimal Nash Equilibria (NE); some communication paradigms introduce added complexity to the learning process, complicating the focus on the essential elements of the messages. To address these challenges, we propose a novel method called O ptimistic S equential S oft Actor Critic with M otivational C ommunication (OSSMC). The key idea of OSSMC is to utilize a greedy-driven approach to explore the potential value of individual policies, named optimistic Q-values, which serve as an upper bound for the Q-value of the current policy. We then integrate a sequential update mechanism with optimistic Q-value for agents, aiming to ensure monotonic improvement in the joint policy optimization process. Moreover, we establish motivational communication modules for each agent to disseminate motivational messages to promote cooperative behaviors. Finally, we employ a value regularization strategy from the Soft Actor Critic (SAC) method to maximize entropy and improve exploration capabilities. The performance of OSSMC was rigorously evaluated against a series of challenging benchmark sets. Empirical results demonstrate that OSSMC not only surpasses current baseline algorithms but also exhibits a more rapid convergence rate.

引用

下载

页数：12

共 50 条

[31] Emergent Communication in Multi-Agent Reinforcement Learning for Future Wireless Networks
Chafii M.
Naoumi S.
Alami R.
Almazrouei E.
Bennis M.
Debbah M.
IEEE Internet of Things Magazine, 2023, 6 (04): : 18 - 24
[32] DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning
Zhao, Canzhe
Ze, Yanjie
Dong, Jing
Wang, Baoxiang
Li, Shuai
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4638 - 4646
[33] HyperComm: Hypergraph-based communication in multi-agent reinforcement learning
Zhu, Tianyu
Shi, Xinli
Xu, Xiangping
Gui, Jie
Cao, Jinde
NEURAL NETWORKS, 2024, 178
[34] Multi-Agent Cognition Difference Reinforcement Learning for Multi-Agent Cooperation
Wang, Huimu
Qiu, Tenghai
Liu, Zhen
Pu, Zhiqiang
Yi, Jianqiang
Yuan, Wanmai
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[35] Multi-Agent Uncertainty Sharing for Cooperative Multi-Agent Reinforcement Learning
Chen, Hao
Yang, Guangkai
Zhang, Junge
Yin, Qiyue
Huang, Kaiqi
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[36] Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation
Sessa, Pier Giuseppe
Kamgarpour, Maryam
Krause, Andreas
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 19580 - 19597
[37] Hierarchical multi-agent reinforcement learning
Mohammad Ghavamzadeh
Sridhar Mahadevan
Rajbala Makar
Autonomous Agents and Multi-Agent Systems, 2006, 13 : 197 - 229
[38] Learning to Share in Multi-Agent Reinforcement Learning
Yi, Yuxuan
Li, Ge
Wang, Yaowei
Lu, Zongqing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[39] Multi-agent reinforcement learning: A survey
Busoniu, Lucian
Babuska, Robert
De Schutter, Bart
2006 9TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION, VOLS 1- 5, 2006, : 1133 - +
[40] The Dynamics of Multi-Agent Reinforcement Learning
Dickens, Luke
Broda, Krysia
Russo, Alessandra
ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 : 367 - 372

← 1 2 3 4 5 →