Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning

被引:0
|
作者
Shao, Jianzhun [1 ]
Qu, Yun [1 ]
Chen, Chen [1 ]
Zhang, Hongchang [1 ]
Ji, Xiangyang [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distribution shift issue common in offline setting and the high dimension issue common in multi-agent setting, making the action out-of-distribution (OOD) and value overestimation phenomenon excessively severe. To mitigate this problem, we propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL) to conduct conservative value estimation. Rather than regarding all the agents as a high dimensional single one and directly applying single agent methods to it, CFCQL calculates conservative regularization for each agent separately in a counterfactual way and then linearly combines them to realize an overall conservative value estimation. We prove that it still enjoys the underestimation property and the performance guarantee as those single agent conservative methods do, but the induced regularization and safe policy improvement bound are independent of the agent number, which is therefore theoretically superior to the direct treatment referred to above, especially when the agent number is large. We further conduct experiments on four environments including both discrete and continuous action settings on both existing and our man-made datasets, demonstrating that CFCQL outperforms existing methods on most datasets and even with a remarkable margin on some of them.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning
    Li, Jiahui
    Kuang, Kun
    Wang, Baoxiang
    Liu, Furui
    Chen, Long
    Wu, Fei
    Xiao, Jun
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 934 - 942
  • [2] Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning
    Tian, Qi
    Kuang, Kun
    Liu, Furui
    Wang, Baoxiang
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10, 2023, : 11672 - 11680
  • [3] Competitive Multi-Agent Deep Reinforcement Learning with Counterfactual Thinking
    Wang, Yue
    Wan, Yao
    Zhang, Chenwei
    Bai, Lu
    Cui, Lixin
    Yu, Philip S.
    [J]. 2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 1366 - 1371
  • [4] Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward
    Shao, Kun
    Zhu, Yuanheng
    Tang, Zhentao
    Zhao, Dongbin
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [5] Offline Multi-Agent Reinforcement Learning in Custom Game Scenario
    Shukla, Indu
    Wilson, William R.
    Henslee, Althea C.
    Dozier, Haley R.
    [J]. 2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 329 - 331
  • [6] Adaptable Conservative Q-Learning for Offline Reinforcement Learning
    Qiu, Lyn
    Li, Xu
    Liang, Lenghan
    Sun, Mingming
    Yan, Junchi
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 200 - 212
  • [7] Mildly Conservative Q-Learning for Offline Reinforcement Learning
    Lyu, Jiafei
    Ma, Xiaoteng
    Li, Xiu
    Lu, Zongqing
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [8] Multi-Agent Reinforcement Learning - An Exploration Using Q-Learning
    Graham, Caoimhin
    Bell, David
    Luo, Zhihui
    [J]. RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XXVI: INCORPORATING APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS XVII, 2010, : 293 - 298
  • [9] Reward-Poisoning Attacks on Offline Multi-Agent Reinforcement Learning
    Wu, Young
    McMahan, Jeremy
    Zhu, Xiaojin
    Xie, Qiaomin
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10426 - 10434
  • [10] Multi-Agent Reinforcement Learning
    Stankovic, Milos
    [J]. 2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 43 - 43