Expert demonstrations guide reward decomposition for multi-agent cooperation

被引:1
|
作者
Liu, Weiwei [1 ,2 ,4 ]
Jing, Wei [2 ]
Liu, Shanqi [1 ]
Ruan, Yudi [1 ]
Zhang, Kexin [1 ]
Yang, Jiang [3 ]
Liu, Yong [4 ]
机构
[1] Zhejiang Univ, Huzhou Inst, Huzhou 313002, Peoples R China
[2] Alibaba DAMO Acad, Dept Autonomous Driving Lab, Hangzhou, Peoples R China
[3] China Res & Dev Acad Machinery Equipment, Beijing, Peoples R China
[4] Zhejiang Univ, Coll Control Sci & Engn, Adv Percept Robot & Intelligent Learning Lab, Hangzhou, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 27期
关键词
Multi-agent reinforcement learning; Expert demonstrations; Multi-agent systems; Reward decomposition; Inverse reinforcement learning;
D O I
10.1007/s00521-023-08785-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans are able to achieve good teamwork through collaboration, since the contributions of the actions from human team members are properly understood by each individual. Therefore, reasonable credit assignment is crucial for multi-agent cooperation. Although existing work uses value decomposition algorithms to mitigate the credit assignment problem, since they decompose the global value function at multi-agents' local value function level, the overall evaluation of the value function can easily lead to approximation errors. Moreover, such strategies are vulnerable to sparse reward scenarios. In this paper, we propose to use expert demonstrations to guide the team reward decomposition at each time step, rather than value decomposition. The proposed method computes the reward ratio of each agent according to the similarity between the state-action pair of the agent and the expert demonstrations. In addition, under this setting, each agent can independently train its value function and evaluate its behavior, which makes the algorithm highly robust to team rewards. Moreover, the proposed method constrains the policy to collect data with similar distribution to the expert data during the exploration, which makes policy update more robust. We conduct extensive experiments to validate our proposed method in various MARL environments, the results show that our algorithm outperforms the state-of-the-art algorithms in most scenarios; our method is robust to various reward functions; and the trajectories by our policy is closer to that of the expert policy.
引用
收藏
页码:19847 / 19863
页数:17
相关论文
共 50 条
  • [1] Expert demonstrations guide reward decomposition for multi-agent cooperation
    Liu Weiwei
    Jing Wei
    Liu Shanqi
    Ruan Yudi
    Zhang Kexin
    Yang Jiang
    Liu Yong
    [J]. Neural Computing and Applications, 2023, 35 : 19847 - 19863
  • [2] Leveraging Expert Demonstrations in Robot Cooperation with Multi-Agent Reinforcement Learning
    Zhang, Zhaolong
    Li, Yihui
    Rojas, Juan
    Guan, Yisheng
    [J]. INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2021, PT II, 2021, 13014 : 211 - 222
  • [3] Cooperation learning in Multi-Agent Systems with annotation and reward
    Yoshida, Tetsuya
    [J]. INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2007, 11 (01) : 19 - 34
  • [4] Hybrid Learning for Multi-agent Cooperation with Sub-optimal Demonstrations
    Peng, Peixi
    Xing, Junliang
    Cao, Lili
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3037 - 3043
  • [5] Cooperation in multi-agent bidding
    Wu, DJ
    Sun, YJ
    [J]. DECISION SUPPORT SYSTEMS, 2002, 33 (03) : 335 - 347
  • [6] Learning multi-agent cooperation
    Rivera, Corban
    Staley, Edward
    Llorens, Ashley
    [J]. FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [7] Study of multi-agent cooperation
    Hua, Z
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 3014 - 3017
  • [8] On cooperation in multi-agent systems
    Doran, JE
    Franklin, S
    Jennings, NR
    Norman, TJ
    [J]. KNOWLEDGE ENGINEERING REVIEW, 1997, 12 (03): : 309 - 314
  • [9] Multi-Agent Cognition Difference Reinforcement Learning for Multi-Agent Cooperation
    Wang, Huimu
    Qiu, Tenghai
    Liu, Zhen
    Pu, Zhiqiang
    Yi, Jianqiang
    Yuan, Wanmai
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [10] Multi-agent Cooperation Algorithm Based on Individual Gap Emotion in Sparse Reward Scenarios
    Wang, Hao
    Wang, Jing
    Fang, Baofu
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (05): : 451 - 460