q-Learning in Continuous Time

被引:0
|
作者
Jia, Yanwei [1 ]
Zhou, Xun Yu [2 ,3 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
[2] Columbia Univ, Dept Ind Engn & Operat Res, New York, NY 10027 USA
[3] Columbia Univ, Data Sci Inst, New York, NY 10027 USA
关键词
continuous-time reinforcement learning; policy improvement; q-function; martingale; on-policy and off-policy; VARIANCE PORTFOLIO SELECTION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) un-der the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we con -sider its first-order approximation and coin the term "(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a "q-learning" theory around the q-function that is independent of time discretiza-tion. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another re-covers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms.
引用
收藏
页数:61
相关论文
共 50 条
  • [1] Safe Q-learning for continuous-time linear systems
    Bandyopadhyay, Soutrik
    Bhasin, Shubhendu
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 241 - 246
  • [2] Continuous-action Q-learning
    Millán, JDR
    Posenato, D
    Dedieu, E
    MACHINE LEARNING, 2002, 49 (2-3) : 247 - 265
  • [3] Continuous-Action Q-Learning
    José del R. Millán
    Daniele Posenato
    Eric Dedieu
    Machine Learning, 2002, 49 : 247 - 265
  • [4] Continuous Time q-Learning for Mean-Field Control Problems
    Wei, Xiaoli
    Yu, Xiang
    Applied Mathematics and Optimization, 2025, 91 (01):
  • [5] Hamilton-Jacobi-Bellman Equations for Q-Learning in Continuous Time
    Kim, Jeongho
    Yang, Insoon
    LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 739 - 748
  • [6] Q-learning in continuous state and action spaces
    Gaskett, C
    Wettergreen, D
    Zelinsky, A
    ADVANCED TOPICS IN ARTIFICIAL INTELLIGENCE, 1999, 1747 : 417 - 428
  • [7] Convex Q-Learning in Continuous Time with Application to Dispatch of Distributed Energy Resources
    Lu, Fan
    Mathias, Joel
    Meyn, Sean
    Kalsi, Karanjit
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 1529 - 1536
  • [8] Hamilton–Jacobi deep Q-learning for deterministic continuous-time systems with lipschitz continuous controls
    Kim, Jeongho
    Shin, Jaeuk
    Yang, Insoon
    Journal of Machine Learning Research, 2021, 22
  • [9] Convergence of a Q-learning Variant for Continuous States and Actions
    Carden, Stephen
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2014, 49 : 705 - 731
  • [10] A novel Q-learning approach with continuous states and actions
    Zhou, Yi
    Er, Meng Joo
    PROCEEDINGS OF THE 2007 IEEE CONFERENCE ON CONTROL APPLICATIONS, VOLS 1-3, 2007, : 447 - +