Model selection in reinforcement learning

被引：28

作者：

Farahmand, Amir-massoud ^{[1
]}

Szepesvari, Csaba ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada

来源：

MACHINE LEARNING | 2011年 / 85卷 / 03期

关键词：

Reinforcement learning; Model selection; Complexity regularization; Adaptivity; Offline learning; Off-policy learning; Finite-sample bounds; POLICY ITERATION; PREDICTION;

D O I：

10.1007/s10994-011-5254-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, BERMIN, and prove that it enjoys an oracle-like property: the estimator's error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions BERMIN leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity.

引用

页码：299 / 332

页数：34

共 50 条

[21] Automatic Feature Selection for Model-Based Reinforcement Learning in Factored MDPs
Kroon, Mark
Whiteson, Shimon
EIGHTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2009, : 324 - 330
[22] An Analysis of Feature Selection and Reward Function for Model-Based Reinforcement Learning
Shen, Shitian
Lin, Chen
Mostafavi, Behrooz
Barnes, Tiffany
Chi, Min
INTELLIGENT TUTORING SYSTEMS, ITS 2016, 2016, 9684 : 504 - 505
[23] Model-Based Reinforcement Learning in Multiagent Systems with Sequential Action Selection
Akramizadeh, Ali
Afshar, Ahmad
Menhaj, Mohammad Bagher
Jafari, Samira
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (02): : 255 - 263
[24] A deep reinforcement learning model with plan value network for join order selection
Qiao Y.
Wei S.
Gao R.
Han N.
Qiao S.
Song H.
International Journal of Wireless and Mobile Computing, 2021, 21 (04): : 365 - 374
[25] Sample Trajectory Selection Method Based on Large Language Model in Reinforcement Learning
Lai, Jinbang
Zang, Zhaoxiang
IEEE ACCESS, 2024, 12 : 61877 - 61885
[26] Transfer Learning for Operator Selection: A Reinforcement Learning Approach
Durgut, Rafet
Aydin, Mehmet Emin
Rakib, Abdur
ALGORITHMS, 2022, 15 (01)
[27] Automated Feature Selection: A Reinforcement Learning Perspective
Liu, Kunpeng
Fu, Yanjie
Wu, Le
Li, Xiaolin
Aggarwal, Charu
Xiong, Hui
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (03) : 2272 - 2284
[28] Relay Nodes Selection Using Reinforcement Learning
Kim, Haesik
Fujii, Takeo
Umebayashi, Kenta
3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021), 2021, : 329 - 334
[29] Heuristic Selection of Actions in Multiagent Reinforcement Learning
Bianchi, Reinaldo A. C.
Ribeiro, Carlos H. C.
Costa, Anna H. R.
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 690 - 695
[30] Dynamic Algorithm Selection Using Reinforcement Learning
Armstrong, Warren
Christen, Peter
McCreath, Eric
Rendell, Alistair P.
AIDM 2006: INTERNATIONAL WORKSHOP ON INTEGRATING AI AND DATING MINING, 2006, : 18 - +

← 1 2 3 4 5 →