Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games

被引:1
|
作者
Amhraoui, Elmehdi [1 ]
Masrour, Tawfik [1 ,2 ]
机构
[1] Moulay ISMAIL Univ, ENSAM Meknes, Dept Math & Comp Sci, Lab Math Modeling Simulat & Smart Syst L2M3S, BP 15290,Marjane 2, Meknes 50500, Morocco
[2] Univ Quebec Rimouski, Rimouski, PQ, Canada
关键词
Cooperative multiagent systems; Independent learners; Decentralized learning; Lenient learning; Fully cooperative Markov games;
D O I
10.1007/s13042-023-02063-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Lenient Multiagent Reinforcement Learning 2 (LMRL2) is an Independent Learners Algorithm for cooperative multiagent systems that is known to outperform other Independent Learners Algorithms in terms of convergence. However, the algorithm takes longer to converge. In this paper, we first present a new formulation of LMRL2, and then, based on this new formulation, we introduce Expected Lenient Q-learning Algorithm (E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL). The new formulation demonstrates that LMRL2 performs the same update of Q-values as in standard Q-learning, but with a stochastic learning rate that follows a specified probability distribution. Based on this new formulation, E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL addresses the low speed and instabilities in LMRL2 by updating Q-values using a deterministic and evolving learning rate that equals the expected value of LMRL2 learning rate. We compared E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL with Decentralized Q-learning, Distributed Q-learning with and without coordination mechanism, Hysteretic Q-learning, and LMRL2. Our experiments on various test problems demonstrated that E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL is highly effective and surpasses all other algorithms in terms of convergence, especially in stochastic domains. Moreover, E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL outperforms LMRL2 in terms of convergence speed, which is why we regard E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL as a faster variant of LMRL2.
引用
收藏
页码:2781 / 2797
页数:17
相关论文
共 50 条
  • [41] Q-learning automaton
    Qian, F
    Hirata, H
    IEEE/WIC INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2003, : 432 - 437
  • [42] Periodic Q-Learning
    Lee, Donghwan
    He, Niao
    LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 582 - 598
  • [43] Mutual Q-learning
    Reid, Cameron
    Mukhopadhyay, Snehasis
    2020 3RD INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTS (ICCR 2020), 2020, : 128 - 133
  • [44] Robust Q-Learning
    Ertefaie, Ashkan
    McKay, James R.
    Oslin, David
    Strawderman, Robert L.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (533) : 368 - 381
  • [45] Neural Q-learning
    Stephan ten Hagen
    Ben Kröse
    Neural Computing & Applications, 2003, 12 : 81 - 88
  • [46] Neural Q-learning
    ten Hagen, S
    Kröse, B
    NEURAL COMPUTING & APPLICATIONS, 2003, 12 (02): : 81 - 88
  • [47] Logistic Q-Learning
    Bas-Serrano, Joan
    Curi, Sebastian
    Krause, Andreas
    Neu, Gergely
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [48] Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks
    Ghazanfari, Behzad
    Mozayani, Nasser
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 26 (06) : 2771 - 2783
  • [49] Stochastic Primal-Dual Q-Learning Algorithm For Discounted MDPs
    Lee, Donghwan
    He, Niao
    2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 4897 - 4902
  • [50] Exponential Moving Average Q-Learning Algorithm
    Awheda, Mostafa D.
    Schwartz, Howard M.
    PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2013, : 31 - 38