Kernelized Q-Learning for Large-Scale, Potentially Continuous, Markov Decision Processes

被引:0
|
作者
Sledge, Isaac J. [1 ]
Principe, Jose C. [1 ,2 ]
机构
[1] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
[2] Univ Florida, Dept Biomed Engn, Gainesville, FL 32611 USA
关键词
Kernel methods; function approximation; reinforcement learning; TEMPORAL-DIFFERENCE; REGRESSION; CONVERGENCE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a novel means of generalizing experiences agent experiences for large-scale Markov decision processes. Our approach is based on a kernel local linear regression function approximation, which we combine with Q-learning. Through this kernelized regression process, value function estimates from visited portions of the state-action space can be generalized to those areas that have not yet been visited in a non-linear, non-parametric fashion. This can be done when the state-action space is either discrete or continuous. We assess the performance of our approach on the game Super Mario Land 2 for the Nintendo GameBoy system. We show that better performance is obtained with our kernelized Q-learning approach compared to linear function approximators for this complicated environment. Better performance is also witnessed with our approach compared to other non-linear approximators.
引用
收藏
页码:153 / 162
页数:10
相关论文
共 50 条
  • [21] On State Aggregation to Approximate Complex Value Functions in Large-Scale Markov Decision Processes
    Jia, Qing-Shan
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2011, 56 (02) : 333 - 344
  • [22] Simulation-based policy generation using large-scale Markov decision processes
    Zobel, CW
    Scherer, WT
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2001, 31 (06): : 609 - 622
  • [23] Faster saddle-point optimization for solving large-scale Markov decision processes
    Bas-Serrano, Joan
    Neu, Gergely
    LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 413 - 423
  • [24] Learning Policies for Markov Decision Processes in Continuous Spaces
    Paternain, Santiago
    Bazerque, Juan Andres
    Small, Austin
    Ribeiro, Alejandro
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4751 - 4758
  • [25] Online Learning in Markov Decision Processes with Continuous Actions
    Hong, Yi-Te
    Lu, Chi-Jen
    ALGORITHMIC LEARNING THEORY, ALT 2015, 2015, 9355 : 302 - 316
  • [26] Large Scale Markov Decision Processes with Changing Rewards
    Cardoso, Adrian Rivera
    Wang, He
    Xu, Huan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [27] Q-learning algorithms for constrained Markov decision processes with randomized monotone policies:: Application to MIMO transmission control
    Djonin, Dejan V.
    Krishnamurthy, Vikram
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (05) : 2170 - 2181
  • [28] Linear Programming for Large-Scale Markov Decision Problems
    Abbasi-Yadkori, Yasin
    Bartlett, Peter L.
    Malek, Alan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 496 - 504
  • [29] q-Learning in Continuous Time
    Jia, Yanwei
    Zhou, Xun Yu
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [30] Large-Scale Traffic Signal Control Based on Multi-Agent Q-Learning and Pressure
    Qi, Liang
    Sun, Yuanzhen
    Luan, Wenjing
    IEEE ACCESS, 2024, 12 (1092-1101) : 1092 - 1101