Distributional Reward Estimation for Effective Multi-Agent Deep Reinforcement Learning

被引：0

作者：

Hu, Jifeng ^{[1
]}

Sun, Yanchao ^{[2
]}

Chen, Hechang ^{[1
]}

Huang, Sili ^{[1
]}

Piao, Haiyin ^{[3
]}

Chang, Yi ^{[1
]}

Sun, Lichao ^{[4
]}

机构：

[1] Jlilin Univ, Sch Artificial Intelligence, Changchun, Peoples R China

[2] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA

[3] Northwestern Polytech Univ, Xian, Peoples R China

[4] Lehigh Univ, Bethlehem, PA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-agent reinforcement learning has drawn increasing attention in practice, e.g., robotics and automatic driving, as it can explore optimal policies using samples generated by interacting with the environment. However, high reward uncertainty still remains a problem when we want to train a satisfactory model, because obtaining high-quality reward feedback is usually expensive and even infeasible. To handle this issue, previous methods mainly focus on passive reward correction. At the same time, recent active reward estimation methods have proven to be a recipe for reducing the effect of reward uncertainty. In this paper, we propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL). Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training. Specifically, we design the multi-action-branch reward estimation to model reward distributions on all action branches. Then we utilize reward aggregation to obtain stable updating signals during training. Our intuition is that consideration of all possible consequences of actions could be useful for learning policies. The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.

引用

页数：14

共 50 条

[31] Multi-Agent Deep Reinforcement Learning with Emergent Communication
Simoes, David
Lau, Nuno
Reis, Luis Paulo
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[32] Experience Selection in Multi-Agent Deep Reinforcement Learning
Wang, Yishen
Zhang, Zongzhang
2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 864 - 870
[33] Sparse communication in multi-agent deep reinforcement learning
Han, Shuai
Dastani, Mehdi
Wang, Shihan
NEUROCOMPUTING, 2025, 625
[34] Multi-Agent Deep Reinforcement Learning with Human Strategies
Thanh Nguyen
Ngoc Duy Nguyen
Nahavandi, Saeid
2019 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2019, : 1357 - 1362
[35] Competitive Evolution Multi-Agent Deep Reinforcement Learning
Zhou, Wenhong
Chen, Yiting
Li, Jie
PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
[36] Strategic Interaction Multi-Agent Deep Reinforcement Learning
Zhou, Wenhong
Li, Jie
Chen, Yiting
Shen, Lin-Cheng
IEEE ACCESS, 2020, 8 : 119000 - 119009
[37] Cooperative Exploration for Multi-Agent Deep Reinforcement Learning
Liu, Iou-Jen
Jain, Unnat
Yeh, Raymond A.
Schwing, Alexander G.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[38] A review of cooperative multi-agent deep reinforcement learning
Oroojlooy, Afshin
Hajinezhad, Davood
APPLIED INTELLIGENCE, 2023, 53 (11) : 13677 - 13722
[39] Multi-Agent Deep Reinforcement Learning for Walker Systems
Park, Inhee
Moh, Teng-Sheng
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 490 - 495
[40] Action Markets in Deep Multi-Agent Reinforcement Learning
Schmid, Kyrill
Belzner, Lenz
Gabor, Thomas
Phan, Thomy
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT II, 2018, 11140 : 240 - 249

← 1 2 3 4 5 →