q-Learning in Continuous Time

被引:0
|
作者
Jia, Yanwei [1 ]
Zhou, Xun Yu [2 ,3 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
[2] Columbia Univ, Dept Ind Engn & Operat Res, New York, NY 10027 USA
[3] Columbia Univ, Data Sci Inst, New York, NY 10027 USA
关键词
continuous-time reinforcement learning; policy improvement; q-function; martingale; on-policy and off-policy; VARIANCE PORTFOLIO SELECTION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) un-der the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we con -sider its first-order approximation and coin the term "(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a "q-learning" theory around the q-function that is independent of time discretiza-tion. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another re-covers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms.
引用
收藏
页数:61
相关论文
共 50 条
  • [21] Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time Systems with Lipschitz Continuous Controls
    Kim, Jeongho
    Shin, Jaeuk
    Yang, Insoon
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [22] A Deep Q-Learning Approach for Continuous Review Policies with Uncertain Lead Time Demand Patterns
    Zhou, Jianpin
    Zhang, Shuliu
    Li, Yingtang
    2018 11TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 1, 2018, : 266 - 270
  • [23] Comparisons of Continuous-time and Discrete-time Q-learning Schemes for Adaptive Linear Quadratic Control
    Chun, Tae Yoon
    Lee, Jae Young
    Park, Jin Bae
    Choi, Yoon Ho
    2012 PROCEEDINGS OF SICE ANNUAL CONFERENCE (SICE), 2012, : 1228 - 1233
  • [24] Enhanced continuous valued Q-learning for real autonomous robots
    Takeda, M
    Nakamura, T
    Imai, M
    Ogasawara, T
    Asada, M
    ADVANCED ROBOTICS, 2000, 14 (05) : 439 - 441
  • [25] Continuous Deep Q-Learning with Model-based Acceleration
    Gu, Shixiang
    Lillicrap, Timothy
    Sutskever, Ilya
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [26] Approximate Q-Learning for Stacking Problems with Continuous Production and Retrieval
    Fechter, Judith
    Beham, Andreas
    Wagner, Stefan
    Affenzeller, Michael
    APPLIED ARTIFICIAL INTELLIGENCE, 2019, 33 (01) : 68 - 86
  • [27] Q-learning with continuous state spaces and finite decision set
    Barty, Kengy
    Girardeau, Pierre
    Roy, Jean-Sebastien
    Strugarek, Cyrille
    2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 346 - +
  • [28] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
    Ohnishi, Shota
    Uchibe, Eiji
    Yamaguchi, Yotaro
    Nakanishi, Kosuke
    Yasui, Yuji
    Ishii, Shin
    FRONTIERS IN NEUROROBOTICS, 2019, 13
  • [29] Learning rates for Q-learning
    Even-Dar, E
    Mansour, Y
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
  • [30] Learning rates for Q-Learning
    Even-Dar, E
    Mansour, Y
    COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604