Discovering Latent Variables for the Tasks With Confounders in Multi-Agent Reinforcement Learning

被引:0
|
作者
Jiang, Kun [1 ]
Liu, Wenzhang [2 ]
Wang, Yuanda [1 ]
Dong, Lu [3 ]
Sun, Changyin [1 ,4 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
[2] Anhui Univ, Sch Artificial Intelligence, Hefei 230601, Peoples R China
[3] Southeast Univ, Sch Cyber Sci & Engn, Nanjing 211189, Peoples R China
[4] Minist Educ, Engn Res Ctr Autonomous Unmanned Syst Technol, Hefei 230601, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Current measurement; Reinforcement learning; Approximation algorithms; Probabilistic logic; Prediction algorithms; Inference algorithms; Latent variable model; maximum entropy; multi-agent reinforcement learning (MARL); multi-agent system;
D O I
10.1109/JAS.2024.124281
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning (MARL). It is significantly more difficult for those tasks with latent variables that agents cannot directly observe. However, most of the existing latent variable discovery methods lack a clear representation of latent variables and an effective evaluation of the influence of latent variables on the agent. In this paper, we propose a new MARL algorithm based on the soft actor-critic method for complex continuous control tasks with confounders. It is called the multi-agent soft actor-critic with latent variable (MASAC-LV) algorithm, which uses variational inference theory to infer the compact latent variables representation space from a large amount of offline experience. Besides, we derive the counterfactual policy whose input has no latent variables and quantify the difference between the actual policy and the counterfactual policy via a distance function. This quantified difference is considered an intrinsic motivation that gives additional rewards based on how much the latent variable affects each agent. The proposed algorithm is evaluated on two collaboration tasks with confounders, and the experimental results demonstrate the effectiveness of MASAC-LV compared to other baseline algorithms.
引用
收藏
页码:1591 / 1604
页数:14
相关论文
共 50 条
  • [1] Discovering Latent Variables for the Tasks With Confounders in Multi-Agent Reinforcement Learning
    Kun Jiang
    Wenzhang Liu
    Yuanda Wang
    Lu Dong
    Changyin Sun
    [J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11 (07) : 1591 - 1604
  • [2] Automatically Discovering Hierarchies in Multi-Agent Reinforcement Learning
    Cheng, Xiaobei
    Shen, Jing
    Liu, Haibo
    Gu, Guochang
    Zhang, Guoyin
    [J]. ICICSE: 2008 INTERNATIONAL CONFERENCE ON INTERNET COMPUTING IN SCIENCE AND ENGINEERING, PROCEEDINGS, 2008, : 549 - 552
  • [3] Knowledge Reuse of Multi-Agent Reinforcement Learning in Cooperative Tasks
    Shi, Daming
    Tong, Junbo
    Liu, Yi
    Fan, Wenhui
    [J]. ENTROPY, 2022, 24 (04)
  • [4] Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks
    Crespi, Marco
    Custode, Leonardo Lucio
    Iacca, Giovanni
    [J]. BIOINSPIRED OPTIMIZATION METHODS AND THEIR APPLICATIONS, 2022, 13627 : 262 - 276
  • [5] WRFMR: A Multi-Agent Reinforcement Learning Method for Cooperative Tasks
    Liu, Hui
    Zhang, Zhen
    Wang, Dongqing
    [J]. IEEE ACCESS, 2020, 8 : 216320 - 216331
  • [6] Efficient Training Techniques for Multi-Agent Reinforcement Learning in Combat Tasks
    Zhang, Guanyu
    Li, Yuan
    Xu, Xinhai
    Dai, Huadong
    [J]. IEEE ACCESS, 2019, 7 : 109301 - 109310
  • [7] Constraint-based multi-agent reinforcement learning for collaborative tasks
    Shang, Xiumin
    Xu, Tengyu
    Karamouzas, Ioannis
    Kallmann, Marcelo
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2023, 34 (3-4)
  • [8] MRRC: Multi-agent Reinforcement Learning with Rectification Capability in Cooperative Tasks
    Yu, Sheng
    Zhu, Wei
    Liu, Shuhong
    Gong, Zhengwen
    Chen, Haoran
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2023, PT II, 2024, 14448 : 204 - 218
  • [9] Dynamic scheduling of tasks in cloud manufacturing with multi-agent reinforcement learning
    Wang, Xiaohan
    Zhang, Lin
    Liu, Yongkui
    Li, Feng
    Chen, Zhen
    Zhao, Chun
    Bai, Tian
    [J]. JOURNAL OF MANUFACTURING SYSTEMS, 2022, 65 : 130 - 145
  • [10] Multi-Agent Reinforcement Learning
    Stankovic, Milos
    [J]. 2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 43 - 43