Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games

被引：1

作者：

Amhraoui, Elmehdi ^{[1
]}

Masrour, Tawfik ^{[1
,2
]}

机构：

[1] Moulay ISMAIL Univ, ENSAM Meknes, Dept Math & Comp Sci, Lab Math Modeling Simulat & Smart Syst L2M3S, BP 15290,Marjane 2, Meknes 50500, Morocco

[2] Univ Quebec Rimouski, Rimouski, PQ, Canada

来源：

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS | 2024年 / 15卷 / 07期

关键词：

Cooperative multiagent systems; Independent learners; Decentralized learning; Lenient learning; Fully cooperative Markov games;

D O I：

10.1007/s13042-023-02063-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Lenient Multiagent Reinforcement Learning 2 (LMRL2) is an Independent Learners Algorithm for cooperative multiagent systems that is known to outperform other Independent Learners Algorithms in terms of convergence. However, the algorithm takes longer to converge. In this paper, we first present a new formulation of LMRL2, and then, based on this new formulation, we introduce Expected Lenient Q-learning Algorithm (E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL). The new formulation demonstrates that LMRL2 performs the same update of Q-values as in standard Q-learning, but with a stochastic learning rate that follows a specified probability distribution. Based on this new formulation, E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL addresses the low speed and instabilities in LMRL2 by updating Q-values using a deterministic and evolving learning rate that equals the expected value of LMRL2 learning rate. We compared E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL with Decentralized Q-learning, Distributed Q-learning with and without coordination mechanism, Hysteretic Q-learning, and LMRL2. Our experiments on various test problems demonstrated that E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL is highly effective and surpasses all other algorithms in terms of convergence, especially in stochastic domains. Moreover, E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL outperforms LMRL2 in terms of convergence speed, which is why we regard E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL as a faster variant of LMRL2.

引用

页码：2781 / 2797

页数：17

共 50 条

[1] Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games
Amhraoui, Elmehdi
Masrour, Tawfik
JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2023, 108 (04)
[2] Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games
Elmehdi Amhraoui
Tawfik Masrour
Journal of Intelligent & Robotic Systems, 2023, 108
[3] A Novel Heuristic Q-Learning Algorithm for Solving Stochastic Games
Li, Jianwei
Liu, Weiyi
2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1135 - 1144
[4] Asynchronous Decentralized Q-Learning in Stochastic Games
Yongacoglu, Bora
Arslan, Gurdal
Yuksel, Serdar
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 5008 - 5013
[5] Decentralized Q-Learning for Stochastic Teams and Games
Arslan, Gurdal
Yuksel, Serdar
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (04) : 1545 - 1558
[6] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
Wang, Yin-Hao
Li, Tzuu-Hseng S.
Lin, Chih-Jui
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
[7] Decentralized Q-Learning with Constant Aspirations in Stochastic Games
Yongacoglu, Bora
Arslan, Gurdal
Yuksel, Serdar
CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 1744 - 1749
[8] Lenient Learning in Independent-Learner Stochastic Cooperative Games
Wei, Ermo
Luke, Sean
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[9] Decentralized Q-Learning in Zero-sum Markov Games
Sayin, Muhammed O.
Zhang, Kaiqing
Leslie, David S.
Sar, Tamer Ba Comma
Ozdaglar, Asuman
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[10] Cooperative Q-Learning Based on Learning Automata
Yang, Mao
Tian, Yantao
Qi, Xinyue
2009 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS ( ICAL 2009), VOLS 1-3, 2009, : 1972 - 1977

← 1 2 3 4 5 →