Reinforcement Learning in Continuous Time and Space: A Stochastic Control Approach

被引:0
|
作者
Wang, Haoran [1 ]
Zariphopoulou, Thaleia [2 ,3 ,4 ]
Zhou, Xun Yu [5 ]
机构
[1] Vanguard Grp Inc, CAI Data Sci & Machine Learning, Malvern, PA 19355 USA
[2] Univ Texas Austin, Dept Math, Austin, TX 78712 USA
[3] Univ Texas Austin, IROM, Austin, TX 78712 USA
[4] Univ Oxford, Oxford Man Inst, Oxford, England
[5] Columbia Univ, Data Sci Inst, Dept Ind Engn & Operat Res, New York, NY 10027 USA
关键词
Reinforcement learning; entropy regularization; stochastic control; relaxed control; linear-quadratic; Gaussian distribution; MULTIARMED BANDITS; EXISTENCE; VIEW; GAME; GO;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider reinforcement learning (RL) in continuous time with continuous feature and action spaces. We motivate and devise an exploratory formulation for the feature dynamics that captures learning under exploration, with the resulting optimization problem being a revitalization of the classical relaxed stochastic control. We then study the problem of achieving the best trade-off between exploration and exploitation by considering an entropy-regularized reward function. We carry out a complete analysis of the problem in the linear-quadratic (LQ) setting and deduce that the optimal feedback control distribution for balancing exploitation and exploration is Gaussian. This in turn interprets the widely adopted Gaussian exploration in RL, beyond its simplicity for sampling. Moreover, the exploitation and exploration are captured respectively by the mean and variance of the Gaussian distribution. We characterize the cost of exploration, which, for the LQ case, is shown to be proportional to the entropy regularization weight and inversely proportional to the discount rate. Finally, as the weight of exploration decays to zero, we prove the convergence of the solution of the entropy-regularized LQ problem to the one of the classical LQ problem.
引用
收藏
页数:34
相关论文
共 50 条
  • [1] Reinforcement learning in continuous time and space
    Doya, K
    [J]. NEURAL COMPUTATION, 2000, 12 (01) : 219 - 245
  • [2] Reinforcement learning for continuous stochastic control problems
    Munos, R
    Bourgine, P
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 1029 - 1035
  • [3] Barycentric interpolators for continuous space & time reinforcement learning
    Munos, R
    Moore, A
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 11, 1999, 11 : 1024 - 1030
  • [4] Linear inverse reinforcement learning in continuous time and space
    Kamalapurkar, Rushikesh
    [J]. 2018 ANNUAL AMERICAN CONTROL CONFERENCE (ACC), 2018, : 1683 - 1688
  • [5] Online Reinforcement Learning in Stochastic Continuous-Time Systems
    Faradonbeh, Mohamad Kazem Shirani
    Faradonbeh, Mohamad Sadegh Shirani
    [J]. THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195 : 612 - 656
  • [6] A Deep Reinforcement Learning Approach for Inventory Control under Stochastic Lead Time and Demand
    Shakya, Manoj
    Lee, Bu-Sung
    Ng, Huey Yuen
    [J]. 2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 760 - 766
  • [7] Experiments of conditioned reinforcement learning in continuous space control tasks
    Fernandez-Gauna, Borja
    Osa, Juan Luis
    Grana, Manuel
    [J]. NEUROCOMPUTING, 2018, 271 : 38 - 47
  • [8] Traffic-signal control reinforcement learning approach for continuous-time Markov games
    Aragon-Gomez, Roman
    Clempner, Julio B.
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 89
  • [9] A practical Reinforcement Learning implementation approach for continuous process control
    Patel, Kalpesh M.
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 2023, 174
  • [10] Approximating the value function for continuous space reinforcement learning in robot control
    Buck, S
    Beetz, M
    Schmitt, T
    [J]. 2002 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-3, PROCEEDINGS, 2002, : 1062 - 1067