Model selection in reinforcement learning

被引:28
|
作者
Farahmand, Amir-massoud [1 ]
Szepesvari, Csaba [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
关键词
Reinforcement learning; Model selection; Complexity regularization; Adaptivity; Offline learning; Off-policy learning; Finite-sample bounds; POLICY ITERATION; PREDICTION;
D O I
10.1007/s10994-011-5254-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, BERMIN, and prove that it enjoys an oracle-like property: the estimator's error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions BERMIN leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity.
引用
收藏
页码:299 / 332
页数:34
相关论文
共 50 条
  • [21] Automatic Feature Selection for Model-Based Reinforcement Learning in Factored MDPs
    Kroon, Mark
    Whiteson, Shimon
    EIGHTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2009, : 324 - 330
  • [22] An Analysis of Feature Selection and Reward Function for Model-Based Reinforcement Learning
    Shen, Shitian
    Lin, Chen
    Mostafavi, Behrooz
    Barnes, Tiffany
    Chi, Min
    INTELLIGENT TUTORING SYSTEMS, ITS 2016, 2016, 9684 : 504 - 505
  • [23] Model-Based Reinforcement Learning in Multiagent Systems with Sequential Action Selection
    Akramizadeh, Ali
    Afshar, Ahmad
    Menhaj, Mohammad Bagher
    Jafari, Samira
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (02): : 255 - 263
  • [24] A deep reinforcement learning model with plan value network for join order selection
    Qiao Y.
    Wei S.
    Gao R.
    Han N.
    Qiao S.
    Song H.
    International Journal of Wireless and Mobile Computing, 2021, 21 (04): : 365 - 374
  • [25] Sample Trajectory Selection Method Based on Large Language Model in Reinforcement Learning
    Lai, Jinbang
    Zang, Zhaoxiang
    IEEE ACCESS, 2024, 12 : 61877 - 61885
  • [26] Transfer Learning for Operator Selection: A Reinforcement Learning Approach
    Durgut, Rafet
    Aydin, Mehmet Emin
    Rakib, Abdur
    ALGORITHMS, 2022, 15 (01)
  • [27] Automated Feature Selection: A Reinforcement Learning Perspective
    Liu, Kunpeng
    Fu, Yanjie
    Wu, Le
    Li, Xiaolin
    Aggarwal, Charu
    Xiong, Hui
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (03) : 2272 - 2284
  • [28] Relay Nodes Selection Using Reinforcement Learning
    Kim, Haesik
    Fujii, Takeo
    Umebayashi, Kenta
    3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021), 2021, : 329 - 334
  • [29] Heuristic Selection of Actions in Multiagent Reinforcement Learning
    Bianchi, Reinaldo A. C.
    Ribeiro, Carlos H. C.
    Costa, Anna H. R.
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 690 - 695
  • [30] Dynamic Algorithm Selection Using Reinforcement Learning
    Armstrong, Warren
    Christen, Peter
    McCreath, Eric
    Rendell, Alistair P.
    AIDM 2006: INTERNATIONAL WORKSHOP ON INTEGRATING AI AND DATING MINING, 2006, : 18 - +