Optimistic Value Instructors for Cooperative Multi-Agent Reinforcement Learning

被引:0
|
作者
Li, Chao [1 ]
Zhang, Yupeng [2 ]
Wang, Jianqi [3 ]
Hu, Yujing [4 ]
Dong, Shaokang [1 ]
Li, Wenbin [1 ]
Lv, Tangjie [4 ]
Fan, Changjie [4 ]
Gao, Yang [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Alibaba DAMO Acad, Hangzhou, Peoples R China
[3] Meituan, Beijing, Peoples R China
[4] NetEase Fuxi AI Lab, Hangzhou, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In cooperative multi-agent reinforcement learning, decentralized agents hold the promise of overcoming the combinatorial explosion of joint action space and enabling greater scalability. However, they are susceptible to a game-theoretic pathology called relative overgeneralization that shadows the optimal joint action. Although recent value-decomposition algorithms guide decentralized agents by learning a factored global action value function, the representational limitation and the inaccurate sampling of optimal joint actions during the learning process make this problem still. To address this limitation, this paper proposes a novel algorithm called Optimistic Value Instructors (OVI). The main idea behind OVI is to introduce multiple optimistic instructors into the value-decomposition paradigm, which are capable of suggesting potentially optimal joint actions and rectifying the factored global action value function to recover these optimal actions. Specifically, the instructors maintain optimistic value estimations of per-agent local actions and thus eliminate the negative effects caused by other agents' exploratory or suboptimal non-cooperation, enabling accurate identification and suggestion of optimal joint actions. Based on the instructors' suggestions, the paper further presents two instructive constraints to rectify the factored global action value function to recover these optimal joint actions, thus overcoming the RO problem. Experimental evaluation of OVI on various cooperative multi-agent tasks demonstrates its superior performance against multiple baselines, highlighting its effectiveness.
引用
收藏
页码:17453 / 17460
页数:8
相关论文
共 50 条
  • [1] Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning
    Zhao, Xutong
    Pan, Yangchen
    Xiao, Chenjun
    Chandar, Sarath
    Rajendran, Janarthanan
    [J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2529 - 2540
  • [2] Cautiously-Optimistic Knowledge Sharing for Cooperative Multi-Agent Reinforcement Learning
    Ba, Yanwen
    Liu, Xuan
    Chen, Xinning
    Wang, Hao
    Xu, Yang
    Li, Kenli
    Zhang, Shigeng
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17299 - 17307
  • [3] Multi-Agent Uncertainty Sharing for Cooperative Multi-Agent Reinforcement Learning
    Chen, Hao
    Yang, Guangkai
    Zhang, Junge
    Yin, Qiyue
    Huang, Kaiqi
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [4] On the Robustness of Cooperative Multi-Agent Reinforcement Learning
    Lin, Jieyu
    Dzeparoska, Kristina
    Zhang, Sai Qian
    Leon-Garcia, Alberto
    Papernot, Nicolas
    [J]. 2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2020), 2020, : 62 - 68
  • [5] Consensus Learning for Cooperative Multi-Agent Reinforcement Learning
    Xu, Zhiwei
    Zhang, Bin
    Li, Dapeng
    Zhang, Zeren
    Zhou, Guangchong
    Chen, Hao
    Fan, Guoliang
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10, 2023, : 11726 - 11734
  • [6] Optimistic sequential multi-agent reinforcement learning with motivational communication
    Huang, Anqi
    Wang, Yongli
    Zhou, Xiaoliang
    Zou, Haochen
    Dong, Xu
    Che, Xun
    [J]. NEURAL NETWORKS, 2024, 179
  • [7] SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning
    Wen, Chao
    Yao, Xinghu
    Wang, Yuhui
    Tan, Xiaoyang
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7301 - 7308
  • [8] Learning Cooperative Intrinsic Motivation in Multi-Agent Reinforcement Learning
    Hong, Seung-Jin
    Lee, Sang-Kwang
    [J]. 12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 1697 - 1699
  • [9] Cooperative Learning of Multi-Agent Systems Via Reinforcement Learning
    Wang, Xin
    Zhao, Chen
    Huang, Tingwen
    Chakrabarti, Prasun
    Kurths, Juergen
    [J]. IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2023, 9 : 13 - 23
  • [10] Cooperative Multi-Agent Reinforcement Learning with Hypergraph Convolution
    Bai, Yunpeng
    Gong, Chen
    Zhang, Bin
    Fan, Guoliang
    Hou, Xinwen
    Lu, Yu
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,