q-Learning in Continuous Time

被引:0
|
作者
Jia, Yanwei [1 ]
Zhou, Xun Yu [2 ,3 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
[2] Columbia Univ, Dept Ind Engn & Operat Res, New York, NY 10027 USA
[3] Columbia Univ, Data Sci Inst, New York, NY 10027 USA
关键词
continuous-time reinforcement learning; policy improvement; q-function; martingale; on-policy and off-policy; VARIANCE PORTFOLIO SELECTION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) un-der the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we con -sider its first-order approximation and coin the term "(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a "q-learning" theory around the q-function that is independent of time discretiza-tion. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another re-covers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms.
引用
收藏
页数:61
相关论文
共 50 条
  • [31] Contextual Q-Learning
    Pinto, Tiago
    Vale, Zita
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2927 - 2928
  • [32] Bayesian Q-learning
    Dearden, R
    Friedman, N
    Russell, S
    FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 761 - 768
  • [33] Zap Q-Learning
    Devraj, Adithya M.
    Meyn, Sean P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [34] CVaR Q-Learning
    Stanko, Silvestr
    Macek, Karel
    COMPUTATIONAL INTELLIGENCE: 11th International Joint Conference, IJCCI 2019, Vienna, Austria, September 17-19, 2019, Revised Selected Papers, 2021, 922 : 333 - 358
  • [35] Fuzzy Q-learning
    Glorennec, PY
    Jouffe, L
    PROCEEDINGS OF THE SIXTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS I - III, 1997, : 659 - 662
  • [36] Convex Q-Learning
    Lu, Fan
    Mehta, Prashant G.
    Meyn, Sean P.
    Neu, Gergely
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 4749 - 4756
  • [37] Q-learning and robotics
    Touzet, CF
    Santos, JM
    SIMULATION IN INDUSTRY 2001, 2001, : 685 - 689
  • [38] Q-learning automaton
    Qian, F
    Hirata, H
    IEEE/WIC INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2003, : 432 - 437
  • [39] Mutual Q-learning
    Reid, Cameron
    Mukhopadhyay, Snehasis
    2020 3RD INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTS (ICCR 2020), 2020, : 128 - 133
  • [40] Periodic Q-Learning
    Lee, Donghwan
    He, Niao
    LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 582 - 598