Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation

被引：2

作者：

Salimibeni, Mohammad ^{[1
]}

Mohammadi, Arash ^{[1
]}

Malekzadeh, Parvin ^{[2
]}

Plataniotis, Konstantinos N. ^{[2
]}

机构：

[1] Concordia Univ, Concordia Inst Informat Syst Engn, Montreal, PQ H3G 1M8, Canada

[2] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 3G8, Canada

来源：

SENSORS | 2022年 / 22卷 / 04期

关键词：

Kalman Temporal Difference; Multiple Model Adaptive Estimation; Multi-Agent Reinforcement Learning; Successor Representation; NEURAL-NETWORK; APPROXIMATION;

D O I：

10.3390/s22041393

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Development of distributed Multi-Agent Reinforcement Learning (MARL) algorithms has attracted an increasing surge of interest lately. Generally speaking, conventional Model-Based (MB) or Model-Free (MF) RL algorithms are not directly applicable to the MARL problems due to utilization of a fixed reward model for learning the underlying value function. While Deep Neural Network (DNN)-based solutions perform well, they are still prone to overfitting, high sensitivity to parameter selection, and sample inefficiency. In this paper, an adaptive Kalman Filter (KF)-based framework is introduced as an efficient alternative to address the aforementioned problems by capitalizing on unique characteristics of KF such as uncertainty modeling and online second order learning. More specifically, the paper proposes the Multi-Agent Adaptive Kalman Temporal Difference (MAK-TD) framework and its Successor Representation-based variant, referred to as the MAK-SR. The proposed MAK-TD/SR frameworks consider the continuous nature of the action-space that is associated with high dimensional multi-agent environments and exploit Kalman Temporal Difference (KTD) to address the parameter uncertainty. The proposed MAK-TD/SR frameworks are evaluated via several experiments, which are implemented through the OpenAI Gym MARL benchmarks. In these experiments, different number of agents in cooperative, competitive, and mixed (cooperative-competitive) scenarios are utilized. The experimental results illustrate superior performance of the proposed MAK-TD/SR frameworks compared to their state-of-the-art counterparts.

引用

页数：23

共 50 条

[1] Efficient Exploration for Multi-Agent Reinforcement Learning via Transferable Successor Features
Wenzhang Liu
Lu Dong
Dan Niu
Changyin Sun
[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9 (09) : 1673 - 1686
[2] Efficient Exploration for Multi-Agent Reinforcement Learning via Transferable Successor Features
Liu, Wenzhang
Dong, Lu
Niu, Dan
Sun, Changyin
[J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (09) : 1673 - 1686
[3] Multi-Agent Cognition Difference Reinforcement Learning for Multi-Agent Cooperation
Wang, Huimu
Qiu, Tenghai
Liu, Zhen
Pu, Zhiqiang
Yi, Jianqiang
Yuan, Wanmai
[J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[4] Adaptive and Dynamic Service Composition via Multi-agent reinforcement learning
Wang, Hongbing
Wu, Qin
Chen, Xin
Yu, Qi
Zheng, Zibin
Bouguettaya, Athman
[J]. 2014 IEEE 21ST INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS 2014), 2014, : 447 - 454
[5] Temporal Abstraction in Reinforcement Learning with the Successor Representation
Machado, Marlos C.
Barreto, Andre
Precup, Doina
Bowling, Michael
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[6] Multi-agent reinforcement learning with adaptive mimetism
Yamaguchi, T
Miura, M
Yachida, M
[J]. ETFA '96 - 1996 IEEE CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION, PROCEEDINGS, VOLS 1 AND 2, 1996, : 288 - 294
[7] Explainable Multi-Agent Reinforcement Learning for Temporal Queries
Boggess, Kayla
Kraus, Sarit
Feng, Lu
[J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 55 - 63
[8] Adaptive mean field multi-agent reinforcement learning
Wang, Xiaoqiang
Ke, Liangjun
Zhang, Gewei
Zhu, Dapeng
[J]. INFORMATION SCIENCES, 2024, 669
[9] ADAPTIVE STATE REPRESENTATIONS FOR MULTI-AGENT REINFORCEMENT LEARNING
De Hauwere, Yann-Michael
Vrancx, Peter
Nowe, Ann
[J]. ICAART 2011: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2011, : 181 - 189
[10] Adaptive Average Exploration in Multi-Agent Reinforcement Learning
Hall, Garrett
Holladay, Ken
[J]. 2020 AIAA/IEEE 39TH DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC) PROCEEDINGS, 2020,

← 1 2 3 4 5 →