Model selection in reinforcement learning

被引:28
|
作者
Farahmand, Amir-massoud [1 ]
Szepesvari, Csaba [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
关键词
Reinforcement learning; Model selection; Complexity regularization; Adaptivity; Offline learning; Off-policy learning; Finite-sample bounds; POLICY ITERATION; PREDICTION;
D O I
10.1007/s10994-011-5254-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, BERMIN, and prove that it enjoys an oracle-like property: the estimator's error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions BERMIN leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity.
引用
收藏
页码:299 / 332
页数:34
相关论文
共 50 条
  • [31] Reinforcement Learning with Classifier Selection for Focused Crawling
    Partalas, Ioannis
    Paliouras, Georgios
    Vlahavas, Ioannis
    ECAI 2008, PROCEEDINGS, 2008, 178 : 759 - +
  • [32] Reinforcement Learning based Dynamic Model Selection for Short-Term Load Forecasting
    Feng, Cong
    Zhang, Jie
    2019 IEEE POWER & ENERGY SOCIETY INNOVATIVE SMART GRID TECHNOLOGIES CONFERENCE (ISGT), 2019,
  • [33] Data Center Selection Based on Reinforcement Learning
    Li, Qirui
    Peng, Zhiping
    Cui, Denglong
    He, Jieguang
    Chen, Ke
    Zhou, Jing
    PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTERNET OF THINGS (CCIOT 2019), 2019, : 14 - 19
  • [34] Object tracking: Feature selection by reinforcement learning
    Deng, Jiali
    Gong, Haigang
    Liu, Minghui
    Liu, Ming
    INTERNATIONAL CONFERENCE ON COMPUTER VISION, APPLICATION, AND DESIGN (CVAD 2021), 2021, 12155
  • [35] Enhanced Federated Reinforcement Learning for Mobility-Aware Node Selection and Model Compression
    Hu, Bingxu
    Huang, Xiaoyan
    Zhang, Ke
    Wu, Fan
    Sun, Chen
    Cui, Tao
    Zhang, Yan
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 158 - 163
  • [36] Autonomous Reusing Policy Selection using Spreading Activation Model in Deep Reinforcement Learning
    Takakuwa, Yusaku
    Kono, Hitoshi
    Fujii, Hiromitsu
    Wen, Wen
    Suzuki, Tsuyoshi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (04) : 8 - 15
  • [37] EMBEDDED INCREMENTAL FEATURE SELECTION FOR REINFORCEMENT LEARNING
    Wright, Robert
    Loscalzo, Steven
    Yu, Lei
    ICAART 2011: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2011, : 263 - 268
  • [38] Experimental demonstration of adaptive model selection based on reinforcement learning in photonic reservoir computing
    Mito, Ryohei
    Kanno, Kazutaka
    Naruse, Makoto
    Uchida, Atsushi
    IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2022, 13 (01): : 123 - 138
  • [39] Enhancing cut selection through reinforcement learning
    Shengchao Wang
    Liang Chen
    Lingfeng Niu
    Yu-Hong Dai
    Science China(Mathematics), 2024, 67 (06) : 1377 - 1394
  • [40] Experience selection in deep reinforcement learning for control
    De Bruin, Tim
    Kober, Jens
    Tuyls, Karl
    Babuška, Robert
    Journal of Machine Learning Research, 2018, 19 : 1 - 56