Kernelized Q-Learning for Large-Scale, Potentially Continuous, Markov Decision Processes

被引:0
|
作者
Sledge, Isaac J. [1 ]
Principe, Jose C. [1 ,2 ]
机构
[1] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
[2] Univ Florida, Dept Biomed Engn, Gainesville, FL 32611 USA
关键词
Kernel methods; function approximation; reinforcement learning; TEMPORAL-DIFFERENCE; REGRESSION; CONVERGENCE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a novel means of generalizing experiences agent experiences for large-scale Markov decision processes. Our approach is based on a kernel local linear regression function approximation, which we combine with Q-learning. Through this kernelized regression process, value function estimates from visited portions of the state-action space can be generalized to those areas that have not yet been visited in a non-linear, non-parametric fashion. This can be done when the state-action space is either discrete or continuous. We assess the performance of our approach on the game Super Mario Land 2 for the Nintendo GameBoy system. We show that better performance is obtained with our kernelized Q-learning approach compared to linear function approximators for this complicated environment. Better performance is also witnessed with our approach compared to other non-linear approximators.
引用
收藏
页码:153 / 162
页数:10
相关论文
共 50 条
  • [41] Distributed Adaptive Optimal Regulation of Uncertain Large-Scale Linear Networked Control Systems Using Q-Learning
    Narayanan, Vignesh
    Jagannathan, S.
    2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, : 587 - 592
  • [42] Policy learning in continuous-time Markov decision processes using Gaussian Processes
    Bartocci, Ezio
    Bortolussi, Luca
    Brazdil, Tomas
    Milios, Dimitrios
    Sanguinetti, Guido
    PERFORMANCE EVALUATION, 2017, 116 : 84 - 100
  • [43] SUBOPTIMAL POLICY DETERMINATION FOR LARGE-SCALE MARKOV DECISION-PROCESSES .2. IMPLEMENTATION AND NUMERICAL EVALUATION
    POPYACK, JL
    WHITE, CC
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1985, 46 (03) : 343 - 358
  • [44] Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games
    Amhraoui, Elmehdi
    Masrour, Tawfik
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (07) : 2781 - 2797
  • [45] Continuous Learning for Large-scale Personalized Domain Classification
    Li, Han
    Lee, Jihwan
    Mudgal, Sidharth
    Sarikaya, Ruhi
    Kim, Young-Bum
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3784 - 3794
  • [46] Agent Decision Processes Using Double Deep Q-Networks plus Minimax Q-Learning
    Fitch, Natalie
    Clancy, Daniel
    2021 IEEE AEROSPACE CONFERENCE (AEROCONF 2021), 2021,
  • [47] Multitime scale Markov decision processes
    Chang, HS
    Fard, PJ
    Marcus, SI
    Shayman, M
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2003, 48 (06) : 976 - 987
  • [48] Q-learning in continuous state and action spaces
    Gaskett, C
    Wettergreen, D
    Zelinsky, A
    ADVANCED TOPICS IN ARTIFICIAL INTELLIGENCE, 1999, 1747 : 417 - 428
  • [49] Learning to Collaborate in Markov Decision Processes
    Radanovic, Goran
    Devidze, Rati
    Parkes, David C.
    Singla, Adish
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [50] BISIMULATION METRICS FOR CONTINUOUS MARKOV DECISION PROCESSES
    Ferns, Norm
    Panangaden, Prakash
    Precup, Doina
    SIAM JOURNAL ON COMPUTING, 2011, 40 (06) : 1662 - 1714