A distributed adaptive policy gradient method based on momentum for multi-agent reinforcement learning

被引：0

作者：

Shi, Junru ^{[1
]}

Wang, Xin ^{[2
]}

Zhang, Mingchuan ^{[1
]}

Liu, Muhua ^{[1
]}

Zhu, Junlong ^{[1
]}

Wu, Qingtao ^{[1
]}

机构：

[1] Henan Univ Sci & Technol, Sch Informat Engn, Luoyang 471023, Peoples R China

[2] Shanghai Int Studies Univ, Sch Business & Management, Shanghai 200083, Peoples R China

来源：

COMPLEX & INTELLIGENT SYSTEMS | 2024年 / 10卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Distributed reinforcement learning; Importance sampling; Momentum; Policy gradient methods; Variance reduction; ALGORITHMS;

D O I：

10.1007/s40747-024-01529-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Policy Gradient (PG) method is one of the most popular algorithms in Reinforcement Learning (RL). However, distributed adaptive variants of PG are rarely studied in multi-agent. For this reason, this paper proposes a distributed adaptive policy gradient algorithm (IS-DAPGM) incorporated with Adam-type updates and importance sampling technique. Furthermore, we also establish the theoretical convergence rate of O(1/T)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}(1/\sqrt{T})$$\end{document}, where T represents the number of iterations, it can match the convergence rate of the state-of-the-art centralized policy gradient methods. In addition, many experiments are conducted in a multi-agent environment, which is a modification on the basis of Particle world environment. By comparing with some other distributed PG methods and changing the number of agents, we verify the performance of IS-DAPGM is more efficient than the existing methods.

引用

页码：7297 / 7310

页数：14

共 50 条

[1] A Deep Reinforcement Learning Method based on Deterministic Policy Gradient for Multi-Agent Cooperative Competition
Zuo, Xuan
Xue, Hui-Feng
Wang, Xiao-Yin
Du, Wan-Ru
Tian, Tao
Gao, Shan
Zhang, Pu
CONTROL ENGINEERING AND APPLIED INFORMATICS, 2021, 23 (03): : 88 - 98
[2] QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning
Rehman, Hafiz Muhammad Raza Ur
On, Byung-Won
Ningombam, Devarani Devi
Yi, Sungwon
Choi, Gyu Sang
IEEE ACCESS, 2021, 9 : 129728 - 129741
[3] Learning Distributed Coordinated Policy in Catching Game with Multi-Agent Reinforcement Learning
Liu, Xiangyu
Tan, Ying
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[4] Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
Ren, Jineng
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
[5] Multi-Agent Reinforcement Learning With Distributed Targeted Multi-Agent Communication
Xu, Chi
Zhang, Hui
Zhang, Ya
2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 2915 - 2920
[6] QDAP: Downsizing adaptive policy for cooperative multi-agent reinforcement learning
Zhao, Zhitong
Zhang, Ya
Wang, Siying
Zhang, Fan
Zhang, Malu
Chen, Wenyu
KNOWLEDGE-BASED SYSTEMS, 2024, 294
[7] Parallel and distributed multi-agent reinforcement learning
Kaya, M
Arslan, A
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, 2001, : 437 - 441
[8] Coding for Distributed Multi-Agent Reinforcement Learning
Wang, Baoqian
Xie, Junfei
Atanasov, Nikolay
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 10625 - 10631
[9] Distributed reinforcement learning in multi-agent networks
Kar, Soummya
Moura, Jose M. F.
Poor, H. Vincent
2013 IEEE 5TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2013), 2013, : 296 - +
[10] Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning
Lu, Songtao
Zhang, Kaiqing
Chen, Tianyi
Basar, Tamer
Horesh, Lior
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8767 - 8775

← 1 2 3 4 5 →