Model selection in reinforcement learning

被引:28
|
作者
Farahmand, Amir-massoud [1 ]
Szepesvari, Csaba [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
关键词
Reinforcement learning; Model selection; Complexity regularization; Adaptivity; Offline learning; Off-policy learning; Finite-sample bounds; POLICY ITERATION; PREDICTION;
D O I
10.1007/s10994-011-5254-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, BERMIN, and prove that it enjoys an oracle-like property: the estimator's error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions BERMIN leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity.
引用
收藏
页码:299 / 332
页数:34
相关论文
共 50 条
  • [41] A reinforcement learning approach for dynamic supplier selection
    Kim, Tae Il
    Bilsel, R. Ufuk
    Kumara, Soundar R. T.
    PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON SERVICE OPERATIONS AND LOGISTICS, AND INFORMATICS, 2007, : 19 - +
  • [42] Time Series Anomaly Detection via Reinforcement Learning-Based Model Selection
    Zhang, Jiuqi Elise
    Wu, Di
    Boulet, Benoit
    2022 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2022, : 193 - 199
  • [43] Enhancing cut selection through reinforcement learning
    Wang, Shengchao
    Chen, Liang
    Niu, Lingfeng
    Dai, Yu-Hong
    SCIENCE CHINA-MATHEMATICS, 2024, 67 (06) : 1377 - 1394
  • [44] Experience Selection in Deep Reinforcement Learning for Control
    de Bruin, Tim
    Kober, Jens
    Tuyls, Karl
    Babuska, Robert
    JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
  • [45] Reinforcement Learning based Gateway Selection in VANETs
    Alabbas, Hasanain
    Huszak, Arpad
    INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2022, 13 (03) : 195 - 202
  • [46] Empirical studies in action selection with reinforcement learning
    Whiteson, Shimon
    Taylor, Matthew E.
    Stone, Peter
    ADAPTIVE BEHAVIOR, 2007, 15 (01) : 33 - 50
  • [47] Reinforcement learning and approximate Bayesian computation for model selection and parameter calibration applied to a nonlinear
    Ritto, T. G.
    Beregi, S.
    Barton, D. A. W.
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2022, 181
  • [48] Adaptive Model Learning method for Reinforcement Learning
    Hwang, Kao-Shing
    Jiang, Wei-Cheng
    Chen, Yu-Jen
    2012 PROCEEDINGS OF SICE ANNUAL CONFERENCE (SICE), 2012, : 1277 - 1280
  • [49] Military reinforcement learning with large language model-based agents: a case of weapon selection
    Ma, Jungmok
    JOURNAL OF DEFENSE MODELING AND SIMULATION-APPLICATIONS METHODOLOGY TECHNOLOGY-JDMS, 2025,
  • [50] A method for model selection using reinforcement learning when viewing design as a sequential decision process
    Chhabra, Jaskanwal P. S.
    Warn, Gordon P.
    STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION, 2019, 59 (05) : 1521 - 1542