Kernelized Q-Learning for Large-Scale, Potentially Continuous, Markov Decision Processes

被引：0

作者：

Sledge, Isaac J. ^{[1
]}

Principe, Jose C. ^{[1
,2
]}

机构：

[1] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA

[2] Univ Florida, Dept Biomed Engn, Gainesville, FL 32611 USA

来源：

2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2018年

关键词：

Kernel methods; function approximation; reinforcement learning; TEMPORAL-DIFFERENCE; REGRESSION; CONVERGENCE;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce a novel means of generalizing experiences agent experiences for large-scale Markov decision processes. Our approach is based on a kernel local linear regression function approximation, which we combine with Q-learning. Through this kernelized regression process, value function estimates from visited portions of the state-action space can be generalized to those areas that have not yet been visited in a non-linear, non-parametric fashion. This can be done when the state-action space is either discrete or continuous. We assess the performance of our approach on the game Super Mario Land 2 for the Nintendo GameBoy system. We show that better performance is obtained with our kernelized Q-learning approach compared to linear function approximators for this complicated environment. Better performance is also witnessed with our approach compared to other non-linear approximators.

引用

页码：153 / 162

页数：10

共 50 条

[41] Distributed Adaptive Optimal Regulation of Uncertain Large-Scale Linear Networked Control Systems Using Q-Learning
Narayanan, Vignesh
Jagannathan, S.
2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, : 587 - 592
[42] Policy learning in continuous-time Markov decision processes using Gaussian Processes
Bartocci, Ezio
Bortolussi, Luca
Brazdil, Tomas
Milios, Dimitrios
Sanguinetti, Guido
PERFORMANCE EVALUATION, 2017, 116 : 84 - 100
[43] SUBOPTIMAL POLICY DETERMINATION FOR LARGE-SCALE MARKOV DECISION-PROCESSES .2. IMPLEMENTATION AND NUMERICAL EVALUATION
POPYACK, JL
WHITE, CC
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1985, 46 (03) : 343 - 358
[44] Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games
Amhraoui, Elmehdi
Masrour, Tawfik
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (07) : 2781 - 2797
[45] Continuous Learning for Large-scale Personalized Domain Classification
Li, Han
Lee, Jihwan
Mudgal, Sidharth
Sarikaya, Ruhi
Kim, Young-Bum
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3784 - 3794
[46] Agent Decision Processes Using Double Deep Q-Networks plus Minimax Q-Learning
Fitch, Natalie
Clancy, Daniel
2021 IEEE AEROSPACE CONFERENCE (AEROCONF 2021), 2021,
[47] Multitime scale Markov decision processes
Chang, HS
Fard, PJ
Marcus, SI
Shayman, M
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2003, 48 (06) : 976 - 987
[48] Q-learning in continuous state and action spaces
Gaskett, C
Wettergreen, D
Zelinsky, A
ADVANCED TOPICS IN ARTIFICIAL INTELLIGENCE, 1999, 1747 : 417 - 428
[49] Learning to Collaborate in Markov Decision Processes
Radanovic, Goran
Devidze, Rati
Parkes, David C.
Singla, Adish
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[50] BISIMULATION METRICS FOR CONTINUOUS MARKOV DECISION PROCESSES
Ferns, Norm
Panangaden, Prakash
Precup, Doina
SIAM JOURNAL ON COMPUTING, 2011, 40 (06) : 1662 - 1714

← 1 2 3 4 5 →