q-Learning in Continuous Time

被引:0
|
作者
Jia, Yanwei [1 ]
Zhou, Xun Yu [2 ,3 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
[2] Columbia Univ, Dept Ind Engn & Operat Res, New York, NY 10027 USA
[3] Columbia Univ, Data Sci Inst, New York, NY 10027 USA
关键词
continuous-time reinforcement learning; policy improvement; q-function; martingale; on-policy and off-policy; VARIANCE PORTFOLIO SELECTION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) un-der the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we con -sider its first-order approximation and coin the term "(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a "q-learning" theory around the q-function that is independent of time discretiza-tion. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another re-covers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms.
引用
收藏
页数:61
相关论文
共 50 条
  • [41] Neural Q-learning
    ten Hagen, S
    Kröse, B
    NEURAL COMPUTING & APPLICATIONS, 2003, 12 (02): : 81 - 88
  • [42] Logistic Q-Learning
    Bas-Serrano, Joan
    Curi, Sebastian
    Krause, Andreas
    Neu, Gergely
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [43] Neural Q-learning
    Stephan ten Hagen
    Ben Kröse
    Neural Computing & Applications, 2003, 12 : 81 - 88
  • [44] Robust Q-Learning
    Ertefaie, Ashkan
    McKay, James R.
    Oslin, David
    Strawderman, Robert L.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (533) : 368 - 381
  • [45] Adaptive Optimal Control via Continuous-Time Q-Learning for Unknown Nonlinear Affine Systems
    Chen, Anthony Siming
    Herrmann, Guido
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 1007 - 1012
  • [46] Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks
    Ghazanfari, Behzad
    Mozayani, Nasser
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 26 (06) : 2771 - 2783
  • [47] Q-Learning for Continuous-Time Linear Systems: A Data-Driven Implementation of the Kleinman Algorithm
    Possieri, Corrado
    Sassano, Mario
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (10): : 6487 - 6497
  • [48] Continuous Real-Time Estimation of Power System Inertia Using Energy Variations and Q-Learning
    Lavanya, L.
    Swarup, K. Shanti
    IEEE OPEN JOURNAL OF INSTRUMENTATION AND MEASUREMENT, 2023, 2
  • [49] A Q-learning algorithm for Markov decision processes with continuous state spaces
    Hu, Jiaqiao
    Yang, Xiangyu
    Hu, Jian-Qiang
    Peng, Yijie
    SYSTEMS & CONTROL LETTERS, 2024, 187
  • [50] Continuous strategy replicator dynamics for multi-agent Q-learning
    Galstyan, Aram
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2013, 26 (01) : 37 - 53