q-Learning in Continuous Time

被引:0
|
作者
Jia, Yanwei [1 ]
Zhou, Xun Yu [2 ,3 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
[2] Columbia Univ, Dept Ind Engn & Operat Res, New York, NY 10027 USA
[3] Columbia Univ, Data Sci Inst, New York, NY 10027 USA
关键词
continuous-time reinforcement learning; policy improvement; q-function; martingale; on-policy and off-policy; VARIANCE PORTFOLIO SELECTION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) un-der the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we con -sider its first-order approximation and coin the term "(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a "q-learning" theory around the q-function that is independent of time discretiza-tion. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another re-covers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms.
引用
收藏
页数:61
相关论文
共 50 条
  • [11] Continuous Q-Learning for Multi-Agent Cooperation
    Hwang, Kao-Shing
    Jiang, Wei-Cheng
    Lin, Yu-Hong
    Lai, Li-Hsin
    CYBERNETICS AND SYSTEMS, 2012, 43 (03) : 227 - 256
  • [12] Hyperparameter Optimization for Tracking with Continuous Deep Q-Learning
    Dong, Xingping
    Shen, Jianbing
    Wang, Wenguan
    Liu, Yu
    Shao, Ling
    Porikli, Fatih
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 518 - 527
  • [13] Convergence of a Q-learning variant for continuous states and actions
    Carden, S., 1600, AI Access Foundation (49):
  • [14] Continuous deep Q-learning with a simulator for stabilization of uncertain discrete-time systems
    Ikemoto, Junya
    Ushio, Toshimitsu
    IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2021, 12 (04): : 738 - 757
  • [15] Fuzzy Q-learning in continuous state and action space
    Xu M.-L.
    Xu W.-B.
    Journal of China Universities of Posts and Telecommunications, 2010, 17 (04): : 100 - 109
  • [16] Robust Inverse Q-Learning for Continuous-Time Linear Systems in Adversarial Environments
    Lian, Bosen
    Xue, Wenqian
    Lewis, Frank L.
    Chai, Tianyou
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13083 - 13095
  • [18] Q-LEARNING
    WATKINS, CJCH
    DAYAN, P
    MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
  • [19] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [20] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193