Unified reinforcement Q-learning for mean field game and control problems

被引:24
|
作者
Angiuli, Andrea [1 ]
Fouque, Jean-Pierre [1 ]
Lauriere, Mathieu [2 ]
机构
[1] Univ Calif Santa Barbara, Dept Stat & Appl Probabil, South Hall 5504, Santa Barbara, CA 93106 USA
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
关键词
Q-learning; Mean field game; Mean field control; Timescales; Linear-quadratic control;
D O I
10.1007/s00498-021-00310-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a Reinforcement Learning (RL) algorithm to solve infinite horizon asymptotic Mean Field Game (MFG) and Mean Field Control (MFC) problems. Our approach can be described as a unified two-timescale Mean Field Q-learning: The same algorithm can learn either the MFG or the MFC solution by simply tuning the ratio of two learning parameters. The algorithm is in discrete time and space where the agent not only provides an action to the environment but also a distribution of the state in order to take into account the mean field feature of the problem. Importantly, we assume that the agent cannot observe the population's distribution and needs to estimate it in a model-free manner. The asymptotic MFG and MFC problems are also presented in continuous time and space, and compared with classical (non-asymptotic or stationary) MFG and MFC problems. They lead to explicit solutions in the linear-quadratic (LQ) case that are used as benchmarks for the results of our algorithm.
引用
收藏
页码:217 / 271
页数:55
相关论文
共 50 条
  • [41] The Improvement of Q-learning Applied to Imperfect Information Game
    Lin, Jing
    Wang, Xuan
    Han, Lijiao
    Zhang, Jiajia
    Xi, Xinxin
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 1562 - +
  • [42] Double Q-learning Agent for Othello Board Game
    Somasundaram, Thamarai Selvi
    Panneerselvam, Karthikeyan
    Bhuthapuri, Tarun
    Mahadevan, Harini
    Jose, Ashik
    2018 10TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2018, : 216 - 223
  • [43] Evolution of cooperation in the public goods game with Q-learning
    Zheng, Guozhong
    Zhang, Jiqiang
    Deng, Shengfeng
    Cai, Weiran
    Chen, Li
    CHAOS SOLITONS & FRACTALS, 2024, 188
  • [44] The Mean-Squared Error of Double Q-Learning
    Weng, Wentao
    Gupta, Harsh
    He, Niao
    Ying, Lei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [45] Reinforcement Q-learning based flight control for a passenger aircraft under actuator fault
    Navid Mohammadi
    Moein Ebrahimi
    Morteza Tayefi
    Amirali Nikkhah
    Discover Mechanical Engineering, 4 (1):
  • [46] Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory
    Zhao, Jin-Gang
    INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2024, 22 (05) : 1751 - 1759
  • [47] Active exploratory Q-learning for large problems
    Wu, Xianghai
    Kofman, Jonathan
    Tizhoosh, Hamid R.
    2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 4040 - 4045
  • [48] Human-like Autonomous Vehicle Speed Control by Deep Reinforcement Learning with Double Q-Learning
    Zhang, Yi
    Sun, Ping
    Yin, Yuhan
    Lin, Lin
    Wang, Xuesong
    2018 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2018, : 1251 - 1256
  • [49] Safe Q-Learning Approaches for Human-in-Loop Reinforcement Learning
    Veerabathraswamy, Swathi
    Bhatt, Nirav
    2023 NINTH INDIAN CONTROL CONFERENCE, ICC, 2023, : 16 - 21
  • [50] Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics
    Weissenbacher, Matthias
    Sinha, Samarth
    Garg, Animesh
    Kawahara, Yoshinobu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,