Q-learning with Nearest Neighbors

被引：0

作者：

Shah, Devavrat ^{[1
,2
]}

Xie, Qiaomin ^{[1
]}

机构：

[1] MIT, LIDS, Cambridge, MA 02139 USA

[2] MIT, Stat & Data Sci Ctr, Dept EECS, Cambridge, MA 02139 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷

关键词：

MARKOV DECISION-PROCESSES; CONVERGENCE; APPROXIMATION; TREES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a d-dimensional state space and the discounted factor gamma is an element of (0, 1), given an arbitrary sample path with "covering time" L, we establish that the algorithm is guaranteed to output an E-accurate estimate of the optimal Q-function using (O) over tilde (L/(epsilon(3)(1 - gamma)(7))) samples. For instance, for a wellbehaved MDP, the covering time of the sample path under the purely random policy scales as (O) over tilde (1/epsilon(d)), so the sample complexity scales as (O) over tilde (1/epsilon(d+3)). Indeed, we establish a lower bound that argues that the dependence of (Omega) over tilde (1/epsilon(d+2)) is necessary.

引用

页数：11

共 50 条

[1] Q-LEARNING
WATKINS, CJCH
DAYAN, P
[J]. MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
[2] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[3] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
Wang, Yin-Hao
Li, Tzuu-Hseng S.
Lin, Chih-Jui
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
[4] Reliable Nearest Neighbors for Lazy Learning
Ebert, Tobias
Kampmann, Geritt
Nelles, Oliver
[J]. 2011 AMERICAN CONTROL CONFERENCE, 2011, : 3041 - 3046
[5] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
Ohnishi, Shota
Uchibe, Eiji
Yamaguchi, Yotaro
Nakanishi, Kosuke
Yasui, Yuji
Ishii, Shin
[J]. FRONTIERS IN NEUROROBOTICS, 2019, 13
[6] Learning rates for Q-Learning
Even-Dar, E
Mansour, Y
[J]. COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604
[7] Learning rates for Q-learning
Even-Dar, E
Mansour, Y
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
[8] Contextual Q-Learning
Pinto, Tiago
Vale, Zita
[J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2927 - 2928
[9] Zap Q-Learning
Devraj, Adithya M.
Meyn, Sean P.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[10] CVaR Q-Learning
Stanko, Silvestr
Macek, Karel
[J]. COMPUTATIONAL INTELLIGENCE: 11th International Joint Conference, IJCCI 2019, Vienna, Austria, September 17-19, 2019, Revised Selected Papers, 2021, 922 : 333 - 358

← 1 2 3 4 5 →