Model selection in reinforcement learning

被引:28
|
作者
Farahmand, Amir-massoud [1 ]
Szepesvari, Csaba [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
关键词
Reinforcement learning; Model selection; Complexity regularization; Adaptivity; Offline learning; Off-policy learning; Finite-sample bounds; POLICY ITERATION; PREDICTION;
D O I
10.1007/s10994-011-5254-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, BERMIN, and prove that it enjoys an oracle-like property: the estimator's error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions BERMIN leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity.
引用
收藏
页码:299 / 332
页数:34
相关论文
共 50 条
  • [1] Model selection in reinforcement learning
    Amir-massoud Farahmand
    Csaba Szepesvári
    Machine Learning, 2011, 85 : 299 - 332
  • [2] Reinforcement Learning for Model Selection and Hyperparameter Optimization
    Wu J.
    Chen S.-P.
    Chen X.-Y.
    Zhou R.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2020, 49 (02): : 255 - 261
  • [3] Model Selection in Reinforcement Learning with General Function Approximations
    Ghosh, Avishek
    Chowdhury, Sayak Ray
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 148 - 164
  • [4] Abstraction Selection in Model-Based Reinforcement Learning
    Jiang, Nan
    Kulesza, Alex
    Singh, Satinder
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 179 - 188
  • [5] Online Model Selection for Reinforcement Learning with Function Approximation
    Lee, Jonathan N.
    Pacchiano, Aldo
    Muthukumar, Vidya
    Kong, Weihao
    Brunskill, Emma
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [6] Oracle Inequalities for Model Selection in Offline Reinforcement Learning
    Lee, Jonathan N.
    Tucker, George
    Nachum, Ofir
    Dai, Bo
    Brunskill, Emma
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [7] Pessimistic Model Selection for Offline Deep Reinforcement Learning
    Yang, Chao-Han Huck
    Qi, Zhengling
    Cui, Yifan
    Chen, Pin-Yu
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2379 - 2389
  • [8] A Model Selection Approach for Corruption Robust Reinforcement Learning
    Wei, Chen-Yu
    Dann, Christoph
    Zimmert, Julian
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167
  • [9] Adaptive model selection in photonic reservoir computing by reinforcement learning
    Kazutaka Kanno
    Makoto Naruse
    Atsushi Uchida
    Scientific Reports, 10
  • [10] Adaptive model selection in photonic reservoir computing by reinforcement learning
    Kanno, Kazutaka
    Naruse, Makoto
    Uchida, Atsushi
    SCIENTIFIC REPORTS, 2020, 10 (01)