Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games

被引:1
|
作者
Amhraoui, Elmehdi [1 ]
Masrour, Tawfik [1 ,2 ]
机构
[1] Moulay ISMAIL Univ, ENSAM Meknes, Dept Math & Comp Sci, Lab Math Modeling Simulat & Smart Syst L2M3S, BP 15290,Marjane 2, Meknes 50500, Morocco
[2] Univ Quebec Rimouski, Rimouski, PQ, Canada
关键词
Cooperative multiagent systems; Independent learners; Decentralized learning; Lenient learning; Fully cooperative Markov games;
D O I
10.1007/s13042-023-02063-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Lenient Multiagent Reinforcement Learning 2 (LMRL2) is an Independent Learners Algorithm for cooperative multiagent systems that is known to outperform other Independent Learners Algorithms in terms of convergence. However, the algorithm takes longer to converge. In this paper, we first present a new formulation of LMRL2, and then, based on this new formulation, we introduce Expected Lenient Q-learning Algorithm (E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL). The new formulation demonstrates that LMRL2 performs the same update of Q-values as in standard Q-learning, but with a stochastic learning rate that follows a specified probability distribution. Based on this new formulation, E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL addresses the low speed and instabilities in LMRL2 by updating Q-values using a deterministic and evolving learning rate that equals the expected value of LMRL2 learning rate. We compared E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL with Decentralized Q-learning, Distributed Q-learning with and without coordination mechanism, Hysteretic Q-learning, and LMRL2. Our experiments on various test problems demonstrated that E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL is highly effective and surpasses all other algorithms in terms of convergence, especially in stochastic domains. Moreover, E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL outperforms LMRL2 in terms of convergence speed, which is why we regard E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}$$\end{document}LQL as a faster variant of LMRL2.
引用
收藏
页码:2781 / 2797
页数:17
相关论文
共 50 条
  • [1] Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games
    Amhraoui, Elmehdi
    Masrour, Tawfik
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2023, 108 (04)
  • [2] Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games
    Elmehdi Amhraoui
    Tawfik Masrour
    Journal of Intelligent & Robotic Systems, 2023, 108
  • [3] A Novel Heuristic Q-Learning Algorithm for Solving Stochastic Games
    Li, Jianwei
    Liu, Weiyi
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1135 - 1144
  • [4] Asynchronous Decentralized Q-Learning in Stochastic Games
    Yongacoglu, Bora
    Arslan, Gurdal
    Yuksel, Serdar
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 5008 - 5013
  • [5] Decentralized Q-Learning for Stochastic Teams and Games
    Arslan, Gurdal
    Yuksel, Serdar
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (04) : 1545 - 1558
  • [6] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [7] Decentralized Q-Learning with Constant Aspirations in Stochastic Games
    Yongacoglu, Bora
    Arslan, Gurdal
    Yuksel, Serdar
    CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 1744 - 1749
  • [8] Lenient Learning in Independent-Learner Stochastic Cooperative Games
    Wei, Ermo
    Luke, Sean
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [9] Decentralized Q-Learning in Zero-sum Markov Games
    Sayin, Muhammed O.
    Zhang, Kaiqing
    Leslie, David S.
    Sar, Tamer Ba Comma
    Ozdaglar, Asuman
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [10] Cooperative Q-Learning Based on Learning Automata
    Yang, Mao
    Tian, Yantao
    Qi, Xinyue
    2009 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS ( ICAL 2009), VOLS 1-3, 2009, : 1972 - 1977