Kernelized Q-Learning for Large-Scale, Potentially Continuous, Markov Decision Processes

被引：0

作者：

Sledge, Isaac J. ^{[1
]}

Principe, Jose C. ^{[1
,2
]}

机构：

[1] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA

[2] Univ Florida, Dept Biomed Engn, Gainesville, FL 32611 USA

来源：

2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2018年

关键词：

Kernel methods; function approximation; reinforcement learning; TEMPORAL-DIFFERENCE; REGRESSION; CONVERGENCE;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce a novel means of generalizing experiences agent experiences for large-scale Markov decision processes. Our approach is based on a kernel local linear regression function approximation, which we combine with Q-learning. Through this kernelized regression process, value function estimates from visited portions of the state-action space can be generalized to those areas that have not yet been visited in a non-linear, non-parametric fashion. This can be done when the state-action space is either discrete or continuous. We assess the performance of our approach on the game Super Mario Land 2 for the Nintendo GameBoy system. We show that better performance is obtained with our kernelized Q-learning approach compared to linear function approximators for this complicated environment. Better performance is also witnessed with our approach compared to other non-linear approximators.

引用

页码：153 / 162

页数：10

共 50 条

[1] A Q-learning algorithm for Markov decision processes with continuous state spaces
Hu, Jiaqiao
Yang, Xiangyu
Hu, Jian-Qiang
Peng, Yijie
SYSTEMS & CONTROL LETTERS, 2024, 187
[2] Q-learning for Markov decision processes with a satisfiability criterion
Shah, Suhail M.
Borkar, Vivek S.
SYSTEMS & CONTROL LETTERS, 2018, 113 : 45 - 51
[3] Relative Q-Learning for Average-Reward Markov Decision Processes With Continuous States
Yang, Xiangyu
Hu, Jiaqiao
Hu, Jian-Qiang
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (10) : 6546 - 6560
[4] Online Learning in Kernelized Markov Decision Processes
Chowdhury, Sayak Ray
Gopalan, Aditya
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[5] Risk-aware Q-Learning for Markov Decision Processes
Huang, Wenjie
Haskell, William B.
2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,
[6] On Q-learning Convergence for Non-Markov Decision Processes
Majeed, Sultan Javed
Hutter, Marcus
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2546 - 2552
[7] Safe Q-Learning Method Based on Constrained Markov Decision Processes
Ge, Yangyang
Zhu, Fei
Lin, Xinghong
Liu, Quan
IEEE ACCESS, 2019, 7 : 165007 - 165017
[8] An Aggregation Procedure for Large-Scale Markov Decision Processes
Bartl, Ondrej
PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON MATHEMATICAL METHODS IN ECONOMICS 2004, 2004, : 9 - 15
[9] Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty
Neufeld, Ariel
Sester, Julian
AUTOMATICA, 2024, 168
[10] A Novel Q-learning Algorithm with Function Approximation for Constrained Markov Decision Processes
Lakshmanan, K.
Bhatnagar, Shalabh
2012 50TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2012, : 400 - 405

← 1 2 3 4 5 →