Kernelized Q-Learning for Large-Scale, Potentially Continuous, Markov Decision Processes

被引：0

作者：

Sledge, Isaac J. ^{[1
]}

Principe, Jose C. ^{[1
,2
]}

机构：

[1] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA

[2] Univ Florida, Dept Biomed Engn, Gainesville, FL 32611 USA

来源：

2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2018年

关键词：

Kernel methods; function approximation; reinforcement learning; TEMPORAL-DIFFERENCE; REGRESSION; CONVERGENCE;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce a novel means of generalizing experiences agent experiences for large-scale Markov decision processes. Our approach is based on a kernel local linear regression function approximation, which we combine with Q-learning. Through this kernelized regression process, value function estimates from visited portions of the state-action space can be generalized to those areas that have not yet been visited in a non-linear, non-parametric fashion. This can be done when the state-action space is either discrete or continuous. We assess the performance of our approach on the game Super Mario Land 2 for the Nintendo GameBoy system. We show that better performance is obtained with our kernelized Q-learning approach compared to linear function approximators for this complicated environment. Better performance is also witnessed with our approach compared to other non-linear approximators.

引用

页码：153 / 162

页数：10

共 50 条

[21] On State Aggregation to Approximate Complex Value Functions in Large-Scale Markov Decision Processes
Jia, Qing-Shan
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2011, 56 (02) : 333 - 344
[22] Simulation-based policy generation using large-scale Markov decision processes
Zobel, CW
Scherer, WT
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2001, 31 (06): : 609 - 622
[23] Faster saddle-point optimization for solving large-scale Markov decision processes
Bas-Serrano, Joan
Neu, Gergely
LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 413 - 423
[24] Learning Policies for Markov Decision Processes in Continuous Spaces
Paternain, Santiago
Bazerque, Juan Andres
Small, Austin
Ribeiro, Alejandro
2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4751 - 4758
[25] Online Learning in Markov Decision Processes with Continuous Actions
Hong, Yi-Te
Lu, Chi-Jen
ALGORITHMIC LEARNING THEORY, ALT 2015, 2015, 9355 : 302 - 316
[26] Large Scale Markov Decision Processes with Changing Rewards
Cardoso, Adrian Rivera
Wang, He
Xu, Huan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[27] Q-learning algorithms for constrained Markov decision processes with randomized monotone policies:: Application to MIMO transmission control
Djonin, Dejan V.
Krishnamurthy, Vikram
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (05) : 2170 - 2181
[28] Linear Programming for Large-Scale Markov Decision Problems
Abbasi-Yadkori, Yasin
Bartlett, Peter L.
Malek, Alan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 496 - 504
[29] q-Learning in Continuous Time
Jia, Yanwei
Zhou, Xun Yu
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[30] Large-Scale Traffic Signal Control Based on Multi-Agent Q-Learning and Pressure
Qi, Liang
Sun, Yuanzhen
Luan, Wenjing
IEEE ACCESS, 2024, 12 (1092-1101) : 1092 - 1101

← 1 2 3 4 5 →