Multi-agent actor-critic with time dynamical opponent model

被引:1
|
作者
Tian, Yuan [1 ]
Kladny, Klaus -Rudolf [2 ]
Wang, Qin [3 ]
Huang, Zhiwu [5 ]
Fink, Olga [4 ]
机构
[1] Swiss Fed Inst Technol, Intelligent Maintenance Syst, Zurich, Switzerland
[2] Swiss Fed Inst Technol, Data Sci, Zurich, Switzerland
[3] Swiss Fed Inst Technol, Zurich, Switzerland
[4] Ecole Polytech Fed Lausanne, Intelligent Maintenance & Operat Syst, Lausanne, Switzerland
[5] Singapore Management Univ, Comp Sci, Singapore, Singapore
基金
瑞士国家科学基金会;
关键词
Reinforcement learning; Multi -agent reinforcement learning; Multi -agent systems; Opponent modeling; Non-stationarity; LEVEL;
D O I
10.1016/j.neucom.2022.10.045
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In multi-agent reinforcement learning, multiple agents learn simultaneously while interacting with a common environment and each other. Since the agents adapt their policies during learning, not only the behavior of a single agent becomes non-stationary, but also the environment as perceived by the agent. This renders it particularly challenging to perform policy improvement. In this paper, we propose to exploit the fact that the agents seek to improve their expected cumulative reward and introduce a novel Time Dynamical Opponent Model (TDOM) to encode the knowledge that the opponent policies tend to improve over time. We motivate TDOM theoretically by deriving a lower bound of the log objective of an individual agent and further propose Multi-Agent Actor-Critic with Time Dynamical Opponent Model (TDOM-AC). We evaluate the proposed TDOM-AC on a differential game and the Multi-agent Particle Environment. We show empirically that TDOM achieves superior opponent behavior prediction during test time. The proposed TDOM-AC methodology outperforms state-of-the-art Actor-Critic methods on the performed tasks in cooperative and especially in mixed cooperative-competitive environments. TDOM-AC results in a more stable training and a faster convergence. Our code is available at https:// github.com/Yuantian013/TDOM-AC.(c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:165 / 172
页数:8
相关论文
共 50 条
  • [1] B -Level Actor-Critic for Multi-Agent Coordination
    Zhang, Haifeng
    Chen, Weizhe
    Huang, Zeren
    Li, Minne
    Yang, Yaodong
    Zhang, Weinan
    Wang, Jun
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7325 - 7332
  • [2] Multi-agent reinforcement learning by the actor-critic model with an attention interface
    Zhang, Lixiang
    Li, Jingchen
    Zhu, Yi'an
    Shi, Haobin
    Hwang, Kao-Shing
    [J]. NEUROCOMPUTING, 2022, 471 : 275 - 284
  • [3] Divergence-Regularized Multi-Agent Actor-Critic
    Su, Kefan
    Lu, Zongqing
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [4] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
    Diddigi, Raghuram Bharadwaj
    Reddy, D. Sai Koti
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    [J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
  • [5] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Prashant Trivedi
    Nandyala Hemachandra
    [J]. Dynamic Games and Applications, 2023, 13 : 25 - 55
  • [6] A New Advantage Actor-Critic Algorithm For Multi-Agent Environments
    Paczolay, Gabor
    Harmati, Istvan
    [J]. 2020 23RD IEEE INTERNATIONAL SYMPOSIUM ON MEASUREMENT AND CONTROL IN ROBOTICS (ISMCR), 2020,
  • [7] Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
    Christianos, Filippos
    Schafer, Lukas
    Albrecht, Stefano V.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [8] Improving sample efficiency in Multi-Agent Actor-Critic methods
    Ye, Zhenhui
    Chen, Yining
    Jiang, Xiaohong
    Song, Guanghua
    Yang, Bowei
    Fan, Sheng
    [J]. APPLIED INTELLIGENCE, 2022, 52 (04) : 3691 - 3704
  • [9] Distributed Multi-Agent Reinforcement Learning by Actor-Critic Method
    Heredia, Paulo C.
    Mou, Shaoshuai
    [J]. IFAC PAPERSONLINE, 2019, 52 (20): : 363 - 368
  • [10] Improving sample efficiency in Multi-Agent Actor-Critic methods
    Zhenhui Ye
    Yining Chen
    Xiaohong Jiang
    Guanghua Song
    Bowei Yang
    Sheng Fan
    [J]. Applied Intelligence, 2022, 52 : 3691 - 3704