Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games

被引：1

作者：

Amhraoui, Elmehdi ^{[1
]}

Masrour, Tawfik ^{[1
,2
]}

机构：

[1] Moulay ISMAIL Univ, ENSAM Meknes, Dept Math & Comp Sci, Lab Math Modeling Simulat & Smart Syst L2M3S, BP 15290,Marjane 2, Meknes 50500, Morocco

[2] Univ Quebec Rimouski, Rimouski, PQ, Canada

来源：

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS | 2024年 / 15卷 / 07期

关键词：

Cooperative multiagent systems; Independent learners; Decentralized learning; Lenient learning; Fully cooperative Markov games;

D O I：

10.1007/s13042-023-02063-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Lenient Multiagent Reinforcement Learning 2 (LMRL2) is an Independent Learners Algorithm for cooperative multiagent systems that is known to outperform other Independent Learners Algorithms in terms of convergence. However, the algorithm takes longer to converge. In this paper, we first present a new formulation of LMRL2, and then, based on this new formulation, we introduce Expected Lenient Q-learning Algorithm (E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL). The new formulation demonstrates that LMRL2 performs the same update of Q-values as in standard Q-learning, but with a stochastic learning rate that follows a specified probability distribution. Based on this new formulation, E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL addresses the low speed and instabilities in LMRL2 by updating Q-values using a deterministic and evolving learning rate that equals the expected value of LMRL2 learning rate. We compared E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL with Decentralized Q-learning, Distributed Q-learning with and without coordination mechanism, Hysteretic Q-learning, and LMRL2. Our experiments on various test problems demonstrated that E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL is highly effective and surpasses all other algorithms in terms of convergence, especially in stochastic domains. Moreover, E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL outperforms LMRL2 in terms of convergence speed, which is why we regard E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL as a faster variant of LMRL2.

引用

页码：2781 / 2797

页数：17

共 50 条

[41] Q-learning automaton
Qian, F
Hirata, H
IEEE/WIC INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2003, : 432 - 437
[42] Periodic Q-Learning
Lee, Donghwan
He, Niao
LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 582 - 598
[43] Mutual Q-learning
Reid, Cameron
Mukhopadhyay, Snehasis
2020 3RD INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTS (ICCR 2020), 2020, : 128 - 133
[44] Robust Q-Learning
Ertefaie, Ashkan
McKay, James R.
Oslin, David
Strawderman, Robert L.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (533) : 368 - 381
[45] Neural Q-learning
Stephan ten Hagen
Ben Kröse
Neural Computing & Applications, 2003, 12 : 81 - 88
[46] Neural Q-learning
ten Hagen, S
Kröse, B
NEURAL COMPUTING & APPLICATIONS, 2003, 12 (02): : 81 - 88
[47] Logistic Q-Learning
Bas-Serrano, Joan
Curi, Sebastian
Krause, Andreas
Neu, Gergely
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[48] Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks
Ghazanfari, Behzad
Mozayani, Nasser
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 26 (06) : 2771 - 2783
[49] Stochastic Primal-Dual Q-Learning Algorithm For Discounted MDPs
Lee, Donghwan
He, Niao
2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 4897 - 4902
[50] Exponential Moving Average Q-Learning Algorithm
Awheda, Mostafa D.
Schwartz, Howard M.
PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2013, : 31 - 38

← 1 2 3 4 5 →