Q-learning with Nearest Neighbors

被引:0
|
作者
Shah, Devavrat [1 ,2 ]
Xie, Qiaomin [1 ]
机构
[1] MIT, LIDS, Cambridge, MA 02139 USA
[2] MIT, Stat & Data Sci Ctr, Dept EECS, Cambridge, MA 02139 USA
关键词
MARKOV DECISION-PROCESSES; CONVERGENCE; APPROXIMATION; TREES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a d-dimensional state space and the discounted factor gamma is an element of (0, 1), given an arbitrary sample path with "covering time" L, we establish that the algorithm is guaranteed to output an E-accurate estimate of the optimal Q-function using (O) over tilde (L/(epsilon(3)(1 - gamma)(7))) samples. For instance, for a wellbehaved MDP, the covering time of the sample path under the purely random policy scales as (O) over tilde (1/epsilon(d)), so the sample complexity scales as (O) over tilde (1/epsilon(d+3)). Indeed, we establish a lower bound that argues that the dependence of (Omega) over tilde (1/epsilon(d+2)) is necessary.
引用
下载
收藏
页数:11
相关论文
共 50 条
  • [41] A generalization error for Q-learning
    Murphy, Susan A.
    Journal of Machine Learning Research, 2005, 6
  • [42] Boundedness of iterates in Q-Learning
    Gosavi, A
    SYSTEMS & CONTROL LETTERS, 2006, 55 (04) : 347 - 349
  • [43] Distributionally Robust Q-Learning
    Liu, Zijian
    Bai, Qinxun
    Blanchet, Jose
    Dong, Perry
    Xu, Wei
    Zhou, Zhengqing
    Zhou, Zhengyuan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [44] Ensemble Bootstrapping for Q-Learning
    Peer, Oren
    Tessler, Chen
    Merlis, Nadav
    Meir, Ron
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [45] Is Q-learning Provably Efficient?
    Jin, Chi
    Allen-Zhu, Zeyuan
    Bubeck, Sebastien
    Jordan, Michael I.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [46] Selectively Decentralized Q-Learning
    Thanh Nguyen
    Mukhopadhyay, Snehasis
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 328 - 333
  • [47] Generalized Speedy Q-Learning
    John, Indu
    Kamanchi, Chandramouli
    Bhatnagar, Shalabh
    IEEE CONTROL SYSTEMS LETTERS, 2020, 4 (03): : 524 - 529
  • [48] Route Optimization with Q-learning
    Demircan, Semiye
    Aydin, Musa
    Durduran, S. Savas
    PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED COMPUTER SCIENCE (ACS'08): RECENT ADVANCES ON APPLIED COMPUTER SCIENCE, 2008, : 416 - +
  • [49] Weighted Double Q-learning
    Zhang, Zongzhang
    Pan, Zhiyuan
    Kochenderfer, Mykel J.
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3455 - 3461
  • [50] Multi Q-Table Q-Learning
    Kantasewi, Nitchakun
    Marukatat, Sanparith
    Thainimit, Somying
    Manabu, Okumura
    2019 10TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY FOR EMBEDDED SYSTEMS (IC-ICTES), 2019,