Intrinsic Action Tendency Consistency for Cooperative Multi-Agent Reinforcement Learning

被引：0

作者：

Zhang, Junkai ^{[1
,2
]}

Zhang, Yifan ^{[1
,3
,4
]}

Zhang, Xi Sheryl ^{[1
,3
,4
]}

Zang, Yifan ^{[1
,2
]}

Cheng, Jian ^{[1
,3
,4
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[3] Univ Chinese Acad Sci, Nanjing, Peoples R China

[4] Nanjing Artificial Intelligence Res AI, Nanjing, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16 | 2024年

基金：

国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Efficient collaboration in the centralized training with decentralized execution (CTDE) paradigm remains a challenge in cooperative multi-agent systems. We identify divergent action tendencies among agents as a significant obstacle to CTDE's training efficiency, requiring a large number of training samples to achieve a unified consensus on agents' policies. This divergence stems from the lack of adequate team consensus-related guidance signals during credit assignments in CTDE. To address this, we propose Intrinsic Action Tendency Consistency, a novel approach for cooperative multi-agent reinforcement learning. It integrates intrinsic rewards, obtained through an action model, into a reward-additive CTDE (RA-CTDE) framework. We formulate an action model that enables surrounding agents to predict the central agent's action tendency. Leveraging these predictions, we compute a cooperative intrinsic reward that encourages agents to match their actions with their neighbors' predictions. We establish the equivalence between RA-CTDE and CTDE through theoretical analyses, demonstrating that CTDE's training process can be achieved using agents' individual targets. Building on this insight, we introduce a novel method to combine intrinsic rewards and CTDE. Extensive experiments on challenging tasks in SMAC and GRF benchmarks showcase the improved performance of our method.

引用

页码：17600 / 17608

页数：9

共 50 条

[21] Cooperative multi-agent game based on reinforcement learning
Liu, Hongbo
[J]. HIGH-CONFIDENCE COMPUTING, 2024, 4 (01):
[22] Reinforcement learning of coordination in cooperative multi-agent systems
Kapetanakis, S
Kudenko, D
[J]. EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 326 - 331
[23] Training Cooperative Agents for Multi-Agent Reinforcement Learning
Bhalla, Sushrut
Subramanian, Sriram G.
Crowley, Mark
[J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1826 - 1828
[24] Cooperative Exploration for Multi-Agent Deep Reinforcement Learning
Liu, Iou-Jen
Jain, Unnat
Yeh, Raymond A.
Schwing, Alexander G.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[25] Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning
Jacopo Castellini
Frans A. Oliehoek
Rahul Savani
Shimon Whiteson
[J]. Autonomous Agents and Multi-Agent Systems, 2021, 35
[26] Baselines for joint-action reinforcement learning of coordination in cooperative multi-agent systems
Carpenter, M
Kudenko, D
[J]. ADAPTIVE AGENTS AND MULTI-AGENT SYSTEMS II: ADAPTATION AND MULTI-AGENT LEARNING, 2005, 3394 : 55 - 72
[27] Cooperative Action Acquisition Based on Intention Estimation in a Multi-Agent Reinforcement Learning System
Tsubakimoto, Tatsuya
Kobayashi, Kunikazu
[J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2017, 100 (06) : 3 - 10
[28] Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning
Castellini, Jacopo
Oliehoek, Frans A.
Savani, Rahul
Whiteson, Shimon
[J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2021, 35 (02)
[29] LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning
Du, Yali
Han, Lei
Fang, Meng
Dai, Tianhong
Liu, Ji
Tao, Dacheng
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[30] Explainable Action Advising for Multi-Agent Reinforcement Learning
Guo, Yue
Campbell, Joseph
Stepputtis, Simon
Li, Ruiyu
Hughes, Dana
Fang, Fei
Sycara, Katia
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 5515 - 5521

← 1 2 3 4 5 →