MODEL-FREE MEAN-FIELD REINFORCEMENT LEARNING: MEAN-FIELD MDP AND MEAN-FIELD Q-LEARNING

被引:3
|
作者
Carmona, Rene [1 ]
Lauriere, Mathieu [2 ,3 ]
Tan, Zongjun [1 ]
机构
[1] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[2] NYU Shanghai, ECNU Inst Math Sci, Shanghai, Peoples R China
[3] NYU Shanghai, Ctr Data Sci & Artificial Intelligence, Shanghai, Peoples R China
来源
ANNALS OF APPLIED PROBABILITY | 2023年 / 33卷 / 6B期
基金
美国国家科学基金会;
关键词
Mean field reinforcement learning; mean field Markov decision processes; McKean- Vlasov control; MARKOV DECISION-PROCESSES; CONVERGENCE; GAME; DISCRETE;
D O I
10.1214/23-AAP1949
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We study infinite horizon discounted mean field control (MFC) prob-lems with common noise through the lens of mean field Markov decision processes (MFMDP). We allow the agents to use actions that are randomized not only at the individual level but also at the level of the population. This common randomization is introduced for the purpose of exploration from a reinforcement learning (RL) paradigm. It also allows us to establish connec-tions between both closed-loop and open-loop policies for MFC and Markov policies for the MFMDP. In particular, we show that there exists an optimal closed-loop policy for the original MFC and we prove dynamic program-ming principles for the state and state-action value functions. Building on this framework and the notion of state-action value function, we then propose RL methods for such problems, by adapting existing tabular and deep RL methods to the mean-field setting. The main difficulty is the treatment of the population state, which is an input of the policy and the value function. We provide convergence guarantees for the tabular Q-learning algorithm based on discretizations of the simplex. We also show that neural network based deep RL algorithms are more suitable for continuous spaces as they allow us to avoid discretizing the mean field state space. Numerical examples are provided.
引用
收藏
页码:5334 / 5381
页数:48
相关论文
共 50 条
  • [1] Q-Learning in Regularized Mean-field Games
    Anahtarci, Berkay
    Kariksiz, Can Deha
    Saldi, Naci
    [J]. DYNAMIC GAMES AND APPLICATIONS, 2023, 13 (01) : 89 - 117
  • [2] Q-Learning in Regularized Mean-field Games
    Berkay Anahtarci
    Can Deha Kariksiz
    Naci Saldi
    [J]. Dynamic Games and Applications, 2023, 13 : 89 - 117
  • [3] Reinforcement Learning for Mean-Field Game
    Agarwal, Mridul
    Aggarwal, Vaneet
    Ghosh, Arnob
    Tiwari, Nilay
    [J]. ALGORITHMS, 2022, 15 (03)
  • [4] Reinforcement Learning in Stationary Mean-field Games
    Subramanian, Jayakumar
    Mahajan, Aditya
    [J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 251 - 259
  • [5] Learning Mean-Field Games
    Guo, Xin
    Hu, Anran
    Xu, Renyuan
    Zhang, Junzi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] Learning in Mean-Field Games
    Yin, Huibing
    Mehta, Prashant G.
    Meyn, Sean P.
    Shanbhag, Uday V.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (03) : 629 - 644
  • [7] Mean-field learning for satisfactory solutions
    Tembine, Hamidou
    Tempone, Raul
    Vilanova, Pedro
    [J]. 2013 IEEE 52ND ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2013, : 4865 - 4870
  • [8] Learning in Mean-Field Oscillator Games
    Yin, Huibing
    Mehta, Prashant G.
    Meyn, Sean P.
    Shanbhag, Uday V.
    [J]. 49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 3125 - 3132
  • [9] Description of shape coexistence by mean-field and beyond mean-field methods
    Heenen, PH
    Bender, M
    Bonche, P
    Duguet, T
    [J]. INTERNATIONAL JOURNAL OF MODERN PHYSICS E, 2004, 13 (01): : 133 - 138
  • [10] On the Mean-Field Spherical Model
    Michael Kastner
    Oliver Schnetz
    [J]. Journal of Statistical Physics, 2006, 122 : 1195 - 1214