Q-learning with Nearest Neighbors

被引:0
|
作者
Shah, Devavrat [1 ,2 ]
Xie, Qiaomin [1 ]
机构
[1] MIT, LIDS, Cambridge, MA 02139 USA
[2] MIT, Stat & Data Sci Ctr, Dept EECS, Cambridge, MA 02139 USA
关键词
MARKOV DECISION-PROCESSES; CONVERGENCE; APPROXIMATION; TREES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a d-dimensional state space and the discounted factor gamma is an element of (0, 1), given an arbitrary sample path with "covering time" L, we establish that the algorithm is guaranteed to output an E-accurate estimate of the optimal Q-function using (O) over tilde (L/(epsilon(3)(1 - gamma)(7))) samples. For instance, for a wellbehaved MDP, the covering time of the sample path under the purely random policy scales as (O) over tilde (1/epsilon(d)), so the sample complexity scales as (O) over tilde (1/epsilon(d+3)). Indeed, we establish a lower bound that argues that the dependence of (Omega) over tilde (1/epsilon(d+2)) is necessary.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Q-LEARNING
    WATKINS, CJCH
    DAYAN, P
    [J]. MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
  • [2] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [3] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [4] Reliable Nearest Neighbors for Lazy Learning
    Ebert, Tobias
    Kampmann, Geritt
    Nelles, Oliver
    [J]. 2011 AMERICAN CONTROL CONFERENCE, 2011, : 3041 - 3046
  • [5] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
    Ohnishi, Shota
    Uchibe, Eiji
    Yamaguchi, Yotaro
    Nakanishi, Kosuke
    Yasui, Yuji
    Ishii, Shin
    [J]. FRONTIERS IN NEUROROBOTICS, 2019, 13
  • [6] Learning rates for Q-Learning
    Even-Dar, E
    Mansour, Y
    [J]. COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604
  • [7] Learning rates for Q-learning
    Even-Dar, E
    Mansour, Y
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
  • [8] Contextual Q-Learning
    Pinto, Tiago
    Vale, Zita
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2927 - 2928
  • [9] Zap Q-Learning
    Devraj, Adithya M.
    Meyn, Sean P.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [10] CVaR Q-Learning
    Stanko, Silvestr
    Macek, Karel
    [J]. COMPUTATIONAL INTELLIGENCE: 11th International Joint Conference, IJCCI 2019, Vienna, Austria, September 17-19, 2019, Revised Selected Papers, 2021, 922 : 333 - 358