Unified reinforcement Q-learning for mean field game and control problems

被引：24

作者：

Angiuli, Andrea ^{[1
]}

Fouque, Jean-Pierre ^{[1
]}

Lauriere, Mathieu ^{[2
]}

机构：

[1] Univ Calif Santa Barbara, Dept Stat & Appl Probabil, South Hall 5504, Santa Barbara, CA 93106 USA

[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA

来源：

MATHEMATICS OF CONTROL SIGNALS AND SYSTEMS | 2022年 / 34卷 / 02期

关键词：

Q-learning; Mean field game; Mean field control; Timescales; Linear-quadratic control;

D O I：

10.1007/s00498-021-00310-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a Reinforcement Learning (RL) algorithm to solve infinite horizon asymptotic Mean Field Game (MFG) and Mean Field Control (MFC) problems. Our approach can be described as a unified two-timescale Mean Field Q-learning: The same algorithm can learn either the MFG or the MFC solution by simply tuning the ratio of two learning parameters. The algorithm is in discrete time and space where the agent not only provides an action to the environment but also a distribution of the state in order to take into account the mean field feature of the problem. Importantly, we assume that the agent cannot observe the population's distribution and needs to estimate it in a model-free manner. The asymptotic MFG and MFC problems are also presented in continuous time and space, and compared with classical (non-asymptotic or stationary) MFG and MFC problems. They lead to explicit solutions in the linear-quadratic (LQ) case that are used as benchmarks for the results of our algorithm.

引用

页码：217 / 271

页数：55

共 50 条

[31] Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
Da Silva, Lucileide M. D.
Torquato, Matheus F.
Fernandes, Marcelo A. C.
IEEE ACCESS, 2019, 7 : 2782 - 2798
[32] Concurrent Q-learning: Reinforcement learning for dynamic goals and environments
Ollington, RB
Vamplew, PW
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2005, 20 (10) : 1037 - 1052
[33] Constraints Penalized Q-learning for Safe Offline Reinforcement Learning
Xu, Haoran
Zhan, Xianyuan
Zhu, Xiangyu
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8753 - 8760
[34] Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach
Xu, Zhi-xiong
Cao, Lei
Chen, Xi-liang
Li, Chen-xi
Zhang, Yong-liang
Lai, Jun
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09) : 2315 - 2322
[35] Swarm Reinforcement Learning Method Based on Hierarchical Q-Learning
Kuroe, Yasuaki
Takeuchi, Kenya
Maeda, Yutaka
2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
[36] Inverted pendulum control of double q-learning reinforcement learning algorithm based on neural network
Zhang, Daode
Wang, Xiaolong
Li, Xuesheng
Wang, Dong
UPB Scientific Bulletin, Series D: Mechanical Engineering, 2020, 82 (02): : 15 - 26
[37] Mean-Field Game and Reinforcement Learning MEC Resource Provisioning for SFCr
Abouaomar, Amine
Cherkaoui, Soumaya
Mlika, Zoubeir
Kobbane, Abdellatif
2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
[38] Reinforcement distribution in a team of cooperative Q-learning agents
Abbasi, Zahra
Abbasi, Mohammad Ali
PROCEEDINGS OF NINTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2008, : 154 - +
[39] The Sample Complexity of Teaching-by-Reinforcement on Q-Learning
Zhang, Xuezhou
Bharti, Shubham Kumar
Ma, Yuzhe
Singla, Adish
Zhu, Xiaojin
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10939 - 10947
[40] LEARNING HOSE TRANSPORT CONTROL WITH Q-LEARNING
Fernandez-Gauna, Borja
Manuel Lopez-Guede, Jose
Zulueta, Ekaitz
Grana, Manuel
NEURAL NETWORK WORLD, 2010, 20 (07) : 913 - 923

← 1 2 3 4 5 →