Model selection in reinforcement learning

被引：28

作者：

Farahmand, Amir-massoud ^{[1
]}

Szepesvari, Csaba ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada

来源：

MACHINE LEARNING | 2011年 / 85卷 / 03期

关键词：

Reinforcement learning; Model selection; Complexity regularization; Adaptivity; Offline learning; Off-policy learning; Finite-sample bounds; POLICY ITERATION; PREDICTION;

D O I：

10.1007/s10994-011-5254-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, BERMIN, and prove that it enjoys an oracle-like property: the estimator's error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions BERMIN leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity.

引用

页码：299 / 332

页数：34

共 50 条

[1] Model selection in reinforcement learning
Amir-massoud Farahmand
Csaba Szepesvári
Machine Learning, 2011, 85 : 299 - 332
[2] Reinforcement Learning for Model Selection and Hyperparameter Optimization
Wu J.
Chen S.-P.
Chen X.-Y.
Zhou R.
Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2020, 49 (02): : 255 - 261
[3] Model Selection in Reinforcement Learning with General Function Approximations
Ghosh, Avishek
Chowdhury, Sayak Ray
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 148 - 164
[4] Abstraction Selection in Model-Based Reinforcement Learning
Jiang, Nan
Kulesza, Alex
Singh, Satinder
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 179 - 188
[5] Online Model Selection for Reinforcement Learning with Function Approximation
Lee, Jonathan N.
Pacchiano, Aldo
Muthukumar, Vidya
Kong, Weihao
Brunskill, Emma
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[6] Oracle Inequalities for Model Selection in Offline Reinforcement Learning
Lee, Jonathan N.
Tucker, George
Nachum, Ofir
Dai, Bo
Brunskill, Emma
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[7] Pessimistic Model Selection for Offline Deep Reinforcement Learning
Yang, Chao-Han Huck
Qi, Zhengling
Cui, Yifan
Chen, Pin-Yu
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2379 - 2389
[8] A Model Selection Approach for Corruption Robust Reinforcement Learning
Wei, Chen-Yu
Dann, Christoph
Zimmert, Julian
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167
[9] Adaptive model selection in photonic reservoir computing by reinforcement learning
Kazutaka Kanno
Makoto Naruse
Atsushi Uchida
Scientific Reports, 10
[10] Adaptive model selection in photonic reservoir computing by reinforcement learning
Kanno, Kazutaka
Naruse, Makoto
Uchida, Atsushi
SCIENTIFIC REPORTS, 2020, 10 (01)

← 1 2 3 4 5 →