On Q-learning Convergence for Non-Markov Decision Processes

被引:0
|
作者
Majeed, Sultan Javed [1 ]
Hutter, Marcus [1 ]
机构
[1] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal-difference (TD) learning is an attractive, computationally efficient framework for model-free reinforcement learning. Q-learning is one of the most widely used TD learning technique that enables an agent to learn the optimal action-value function, i.e. Q-value function. Contrary to its widespread use, Q-learning has only been proven to converge on Markov Decision Processes (MDPs) and Q-uniform abstractions of finite-state MDPs. On the other hand, most real-world problems are inherently non-Markovian: the full true state of the environment is not revealed by recent observations. In this paper, we investigate the behavior of Q-learning when applied to non-MDP and non-ergodic domains which may have infinitely many underlying states. We prove that the convergence guarantee of Q-learning can be extended to a class of such non-MDP problems, in particular, to some non-stationary domains. We show that state-uniformity of the optimal Q-value function is a necessary and sufficient condition for Q-learning to converge even in the case of infinitely many internal states.
引用
收藏
页码:2546 / 2552
页数:7
相关论文
共 50 条
  • [1] REINFORCEMENT LEARNING OF NON-MARKOV DECISION-PROCESSES
    WHITEHEAD, SD
    LIN, LJ
    [J]. ARTIFICIAL INTELLIGENCE, 1995, 73 (1-2) : 271 - 306
  • [2] Q-learning for Markov decision processes with a satisfiability criterion
    Shah, Suhail M.
    Borkar, Vivek S.
    [J]. SYSTEMS & CONTROL LETTERS, 2018, 113 : 45 - 51
  • [3] Risk-aware Q-Learning for Markov Decision Processes
    Huang, Wenjie
    Haskell, William B.
    [J]. 2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,
  • [4] CLASS OF NON-MARKOV DECISION-PROCESSES
    GLAZEBROOK, KD
    [J]. JOURNAL OF APPLIED PROBABILITY, 1978, 15 (04) : 689 - 698
  • [5] A Q-learning algorithm for Markov decision processes with continuous state spaces
    Hu, Jiaqiao
    Yang, Xiangyu
    Hu, Jian-Qiang
    Peng, Yijie
    [J]. SYSTEMS & CONTROL LETTERS, 2024, 187
  • [6] Safe Q-Learning Method Based on Constrained Markov Decision Processes
    Ge, Yangyang
    Zhu, Fei
    Lin, Xinghong
    Liu, Quan
    [J]. IEEE ACCESS, 2019, 7 : 165007 - 165017
  • [7] Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty
    Neufeld, Ariel
    Sester, Julian
    [J]. AUTOMATICA, 2024, 168
  • [8] A Novel Q-learning Algorithm with Function Approximation for Constrained Markov Decision Processes
    Lakshmanan, K.
    Bhatnagar, Shalabh
    [J]. 2012 50TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2012, : 400 - 405
  • [9] Remarks on non-Markov processes
    van Kampen, NG
    [J]. BRAZILIAN JOURNAL OF PHYSICS, 1998, 28 (02) : 90 - 96
  • [10] FILTERING OF NON-MARKOV PROCESSES
    KOROLEV, AN
    KOTOV, AF
    YAROSHEVSKAYA, KS
    [J]. TELECOMMUNICATIONS AND RADIO ENGINEERING, 1990, 45 (06) : 66 - 66