Model selection in reinforcement learning

被引：28

作者：

Farahmand, Amir-massoud ^{[1
]}

Szepesvari, Csaba ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada

来源：

MACHINE LEARNING | 2011年 / 85卷 / 03期

关键词：

Reinforcement learning; Model selection; Complexity regularization; Adaptivity; Offline learning; Off-policy learning; Finite-sample bounds; POLICY ITERATION; PREDICTION;

D O I：

10.1007/s10994-011-5254-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, BERMIN, and prove that it enjoys an oracle-like property: the estimator's error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions BERMIN leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity.

引用

页码：299 / 332

页数：34

共 50 条

[31] Reinforcement Learning with Classifier Selection for Focused Crawling
Partalas, Ioannis
Paliouras, Georgios
Vlahavas, Ioannis
ECAI 2008, PROCEEDINGS, 2008, 178 : 759 - +
[32] Reinforcement Learning based Dynamic Model Selection for Short-Term Load Forecasting
Feng, Cong
Zhang, Jie
2019 IEEE POWER & ENERGY SOCIETY INNOVATIVE SMART GRID TECHNOLOGIES CONFERENCE (ISGT), 2019,
[33] Data Center Selection Based on Reinforcement Learning
Li, Qirui
Peng, Zhiping
Cui, Denglong
He, Jieguang
Chen, Ke
Zhou, Jing
PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTERNET OF THINGS (CCIOT 2019), 2019, : 14 - 19
[34] Object tracking: Feature selection by reinforcement learning
Deng, Jiali
Gong, Haigang
Liu, Minghui
Liu, Ming
INTERNATIONAL CONFERENCE ON COMPUTER VISION, APPLICATION, AND DESIGN (CVAD 2021), 2021, 12155
[35] Enhanced Federated Reinforcement Learning for Mobility-Aware Node Selection and Model Compression
Hu, Bingxu
Huang, Xiaoyan
Zhang, Ke
Wu, Fan
Sun, Chen
Cui, Tao
Zhang, Yan
IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 158 - 163
[36] Autonomous Reusing Policy Selection using Spreading Activation Model in Deep Reinforcement Learning
Takakuwa, Yusaku
Kono, Hitoshi
Fujii, Hiromitsu
Wen, Wen
Suzuki, Tsuyoshi
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (04) : 8 - 15
[37] EMBEDDED INCREMENTAL FEATURE SELECTION FOR REINFORCEMENT LEARNING
Wright, Robert
Loscalzo, Steven
Yu, Lei
ICAART 2011: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2011, : 263 - 268
[38] Experimental demonstration of adaptive model selection based on reinforcement learning in photonic reservoir computing
Mito, Ryohei
Kanno, Kazutaka
Naruse, Makoto
Uchida, Atsushi
IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2022, 13 (01): : 123 - 138
[39] Enhancing cut selection through reinforcement learning
Shengchao Wang
Liang Chen
Lingfeng Niu
Yu-Hong Dai
Science China(Mathematics), 2024, 67 (06) : 1377 - 1394
[40] Experience selection in deep reinforcement learning for control
De Bruin, Tim
Kober, Jens
Tuyls, Karl
Babuška, Robert
Journal of Machine Learning Research, 2018, 19 : 1 - 56

← 1 2 3 4 5 →