Kernelized Q-Learning for Large-Scale, Potentially Continuous, Markov Decision Processes

被引:0
|
作者
Sledge, Isaac J. [1 ]
Principe, Jose C. [1 ,2 ]
机构
[1] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
[2] Univ Florida, Dept Biomed Engn, Gainesville, FL 32611 USA
关键词
Kernel methods; function approximation; reinforcement learning; TEMPORAL-DIFFERENCE; REGRESSION; CONVERGENCE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a novel means of generalizing experiences agent experiences for large-scale Markov decision processes. Our approach is based on a kernel local linear regression function approximation, which we combine with Q-learning. Through this kernelized regression process, value function estimates from visited portions of the state-action space can be generalized to those areas that have not yet been visited in a non-linear, non-parametric fashion. This can be done when the state-action space is either discrete or continuous. We assess the performance of our approach on the game Super Mario Land 2 for the Nintendo GameBoy system. We show that better performance is obtained with our kernelized Q-learning approach compared to linear function approximators for this complicated environment. Better performance is also witnessed with our approach compared to other non-linear approximators.
引用
收藏
页码:153 / 162
页数:10
相关论文
共 50 条
  • [1] A Q-learning algorithm for Markov decision processes with continuous state spaces
    Hu, Jiaqiao
    Yang, Xiangyu
    Hu, Jian-Qiang
    Peng, Yijie
    SYSTEMS & CONTROL LETTERS, 2024, 187
  • [2] Q-learning for Markov decision processes with a satisfiability criterion
    Shah, Suhail M.
    Borkar, Vivek S.
    SYSTEMS & CONTROL LETTERS, 2018, 113 : 45 - 51
  • [3] Relative Q-Learning for Average-Reward Markov Decision Processes With Continuous States
    Yang, Xiangyu
    Hu, Jiaqiao
    Hu, Jian-Qiang
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (10) : 6546 - 6560
  • [4] Online Learning in Kernelized Markov Decision Processes
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [5] Risk-aware Q-Learning for Markov Decision Processes
    Huang, Wenjie
    Haskell, William B.
    2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,
  • [6] On Q-learning Convergence for Non-Markov Decision Processes
    Majeed, Sultan Javed
    Hutter, Marcus
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2546 - 2552
  • [7] Safe Q-Learning Method Based on Constrained Markov Decision Processes
    Ge, Yangyang
    Zhu, Fei
    Lin, Xinghong
    Liu, Quan
    IEEE ACCESS, 2019, 7 : 165007 - 165017
  • [8] An Aggregation Procedure for Large-Scale Markov Decision Processes
    Bartl, Ondrej
    PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON MATHEMATICAL METHODS IN ECONOMICS 2004, 2004, : 9 - 15
  • [9] Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty
    Neufeld, Ariel
    Sester, Julian
    AUTOMATICA, 2024, 168
  • [10] A Novel Q-learning Algorithm with Function Approximation for Constrained Markov Decision Processes
    Lakshmanan, K.
    Bhatnagar, Shalabh
    2012 50TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2012, : 400 - 405