q-Learning in Continuous Time

被引：0

作者：

Jia, Yanwei ^{[1
]}

Zhou, Xun Yu ^{[2
,3
]}

机构：

[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China

[2] Columbia Univ, Dept Ind Engn & Operat Res, New York, NY 10027 USA

[3] Columbia Univ, Data Sci Inst, New York, NY 10027 USA

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2023年 / 24卷

关键词：

continuous-time reinforcement learning; policy improvement; q-function; martingale; on-policy and off-policy; VARIANCE PORTFOLIO SELECTION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) un-der the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we con -sider its first-order approximation and coin the term "(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a "q-learning" theory around the q-function that is independent of time discretiza-tion. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another re-covers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms.

引用

页数：61

共 50 条

[41] Neural Q-learning
ten Hagen, S
Kröse, B
NEURAL COMPUTING & APPLICATIONS, 2003, 12 (02): : 81 - 88
[42] Logistic Q-Learning
Bas-Serrano, Joan
Curi, Sebastian
Krause, Andreas
Neu, Gergely
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[43] Neural Q-learning
Stephan ten Hagen
Ben Kröse
Neural Computing & Applications, 2003, 12 : 81 - 88
[44] Robust Q-Learning
Ertefaie, Ashkan
McKay, James R.
Oslin, David
Strawderman, Robert L.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (533) : 368 - 381
[45] Adaptive Optimal Control via Continuous-Time Q-Learning for Unknown Nonlinear Affine Systems
Chen, Anthony Siming
Herrmann, Guido
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 1007 - 1012
[46] Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks
Ghazanfari, Behzad
Mozayani, Nasser
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 26 (06) : 2771 - 2783
[47] Q-Learning for Continuous-Time Linear Systems: A Data-Driven Implementation of the Kleinman Algorithm
Possieri, Corrado
Sassano, Mario
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (10): : 6487 - 6497
[48] Continuous Real-Time Estimation of Power System Inertia Using Energy Variations and Q-Learning
Lavanya, L.
Swarup, K. Shanti
IEEE OPEN JOURNAL OF INSTRUMENTATION AND MEASUREMENT, 2023, 2
[49] A Q-learning algorithm for Markov decision processes with continuous state spaces
Hu, Jiaqiao
Yang, Xiangyu
Hu, Jian-Qiang
Peng, Yijie
SYSTEMS & CONTROL LETTERS, 2024, 187
[50] Continuous strategy replicator dynamics for multi-agent Q-learning
Galstyan, Aram
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2013, 26 (01) : 37 - 53

← 1 2 3 4 5 →