Model selection in reinforcement learning

被引：28

作者：

Farahmand, Amir-massoud ^{[1
]}

Szepesvari, Csaba ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada

来源：

MACHINE LEARNING | 2011年 / 85卷 / 03期

关键词：

Reinforcement learning; Model selection; Complexity regularization; Adaptivity; Offline learning; Off-policy learning; Finite-sample bounds; POLICY ITERATION; PREDICTION;

D O I：

10.1007/s10994-011-5254-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, BERMIN, and prove that it enjoys an oracle-like property: the estimator's error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions BERMIN leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity.

引用

页码：299 / 332

页数：34

共 50 条

[41] A reinforcement learning approach for dynamic supplier selection
Kim, Tae Il
Bilsel, R. Ufuk
Kumara, Soundar R. T.
PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON SERVICE OPERATIONS AND LOGISTICS, AND INFORMATICS, 2007, : 19 - +
[42] Time Series Anomaly Detection via Reinforcement Learning-Based Model Selection
Zhang, Jiuqi Elise
Wu, Di
Boulet, Benoit
2022 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2022, : 193 - 199
[43] Enhancing cut selection through reinforcement learning
Wang, Shengchao
Chen, Liang
Niu, Lingfeng
Dai, Yu-Hong
SCIENCE CHINA-MATHEMATICS, 2024, 67 (06) : 1377 - 1394
[44] Experience Selection in Deep Reinforcement Learning for Control
de Bruin, Tim
Kober, Jens
Tuyls, Karl
Babuska, Robert
JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
[45] Reinforcement Learning based Gateway Selection in VANETs
Alabbas, Hasanain
Huszak, Arpad
INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2022, 13 (03) : 195 - 202
[46] Empirical studies in action selection with reinforcement learning
Whiteson, Shimon
Taylor, Matthew E.
Stone, Peter
ADAPTIVE BEHAVIOR, 2007, 15 (01) : 33 - 50
[47] Reinforcement learning and approximate Bayesian computation for model selection and parameter calibration applied to a nonlinear
Ritto, T. G.
Beregi, S.
Barton, D. A. W.
MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2022, 181
[48] Adaptive Model Learning method for Reinforcement Learning
Hwang, Kao-Shing
Jiang, Wei-Cheng
Chen, Yu-Jen
2012 PROCEEDINGS OF SICE ANNUAL CONFERENCE (SICE), 2012, : 1277 - 1280
[49] Military reinforcement learning with large language model-based agents: a case of weapon selection
Ma, Jungmok
JOURNAL OF DEFENSE MODELING AND SIMULATION-APPLICATIONS METHODOLOGY TECHNOLOGY-JDMS, 2025,
[50] A method for model selection using reinforcement learning when viewing design as a sequential decision process
Chhabra, Jaskanwal P. S.
Warn, Gordon P.
STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION, 2019, 59 (05) : 1521 - 1542

← 1 2 3 4 5 →