Unified reinforcement Q-learning for mean field game and control problems

被引:24
|
作者
Angiuli, Andrea [1 ]
Fouque, Jean-Pierre [1 ]
Lauriere, Mathieu [2 ]
机构
[1] Univ Calif Santa Barbara, Dept Stat & Appl Probabil, South Hall 5504, Santa Barbara, CA 93106 USA
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
关键词
Q-learning; Mean field game; Mean field control; Timescales; Linear-quadratic control;
D O I
10.1007/s00498-021-00310-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a Reinforcement Learning (RL) algorithm to solve infinite horizon asymptotic Mean Field Game (MFG) and Mean Field Control (MFC) problems. Our approach can be described as a unified two-timescale Mean Field Q-learning: The same algorithm can learn either the MFG or the MFC solution by simply tuning the ratio of two learning parameters. The algorithm is in discrete time and space where the agent not only provides an action to the environment but also a distribution of the state in order to take into account the mean field feature of the problem. Importantly, we assume that the agent cannot observe the population's distribution and needs to estimate it in a model-free manner. The asymptotic MFG and MFC problems are also presented in continuous time and space, and compared with classical (non-asymptotic or stationary) MFG and MFC problems. They lead to explicit solutions in the linear-quadratic (LQ) case that are used as benchmarks for the results of our algorithm.
引用
收藏
页码:217 / 271
页数:55
相关论文
共 50 条
  • [31] Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
    Da Silva, Lucileide M. D.
    Torquato, Matheus F.
    Fernandes, Marcelo A. C.
    IEEE ACCESS, 2019, 7 : 2782 - 2798
  • [32] Concurrent Q-learning: Reinforcement learning for dynamic goals and environments
    Ollington, RB
    Vamplew, PW
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2005, 20 (10) : 1037 - 1052
  • [33] Constraints Penalized Q-learning for Safe Offline Reinforcement Learning
    Xu, Haoran
    Zhan, Xianyuan
    Zhu, Xiangyu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8753 - 8760
  • [34] Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach
    Xu, Zhi-xiong
    Cao, Lei
    Chen, Xi-liang
    Li, Chen-xi
    Zhang, Yong-liang
    Lai, Jun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09) : 2315 - 2322
  • [35] Swarm Reinforcement Learning Method Based on Hierarchical Q-Learning
    Kuroe, Yasuaki
    Takeuchi, Kenya
    Maeda, Yutaka
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [36] Inverted pendulum control of double q-learning reinforcement learning algorithm based on neural network
    Zhang, Daode
    Wang, Xiaolong
    Li, Xuesheng
    Wang, Dong
    UPB Scientific Bulletin, Series D: Mechanical Engineering, 2020, 82 (02): : 15 - 26
  • [37] Mean-Field Game and Reinforcement Learning MEC Resource Provisioning for SFCr
    Abouaomar, Amine
    Cherkaoui, Soumaya
    Mlika, Zoubeir
    Kobbane, Abdellatif
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [38] Reinforcement distribution in a team of cooperative Q-learning agents
    Abbasi, Zahra
    Abbasi, Mohammad Ali
    PROCEEDINGS OF NINTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2008, : 154 - +
  • [39] The Sample Complexity of Teaching-by-Reinforcement on Q-Learning
    Zhang, Xuezhou
    Bharti, Shubham Kumar
    Ma, Yuzhe
    Singla, Adish
    Zhu, Xiaojin
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10939 - 10947
  • [40] LEARNING HOSE TRANSPORT CONTROL WITH Q-LEARNING
    Fernandez-Gauna, Borja
    Manuel Lopez-Guede, Jose
    Zulueta, Ekaitz
    Grana, Manuel
    NEURAL NETWORK WORLD, 2010, 20 (07) : 913 - 923