Hyperparameter Tuning in Offline Reinforcement Learning

被引：0

作者：

Tittaferrante, Andrew ^{[1
]}

Yassine, Abdulsalam ^{[2
]}

机构：

[1] Lakehead Univ, Elect & Comp Engn, Thunder Bay, ON, Canada

[2] Lakehead Univ, Software Engn, Thunder Bay, ON, Canada

来源：

2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA | 2022年

关键词：

Deep Learning; Reinforcement Learning; Offline Reinforcement Learning;

D O I：

10.1109/ICMLA55696.2022.00101

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we propose a reliable hyper-parameter tuning scheme for offline reinforcement learning. We demonstrate our proposed scheme using the simplest antmaze environment from the standard benchmark offline dataset, D4RL. The usual approach for policy evaluation in offline reinforcement learning involves online evaluation, i.e., cherry-picking best performance on the test environment. To mitigate this cherry-picking, we propose an ad-hoc online evaluation metric, which we name "median-median-return". This metric enables more reliable reporting of results because it represents the expected performance of the learned policy by taking the median online evaluation performance across both epochs and training runs. To demonstrate our scheme, we employ the recently state-of-the-art algorithm, IQL, and perform a thorough hyperparameter search based on our proposed metric. The tuned architectures enjoy notably stronger cherry-picked performance, and the best models are able to surpass the reported state-of-the-art performance on average.

引用

页码：585 / 590

页数：6

共 50 条

[1] Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning
Zhang, Siyuan
Jiang, Nan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[2] Automated Hyperparameter Tuning in Reinforcement Learning for Quadrupedal Robot Locomotion
Kim, Myeongseop
Kim, Jung-Su
Park, Jae-Han
ELECTRONICS, 2024, 13 (01)
[3] Online weighted Q-ensembles for reduced hyperparameter tuning in reinforcement learning
Garcia R.
Caarls W.
Soft Computing, 2024, 28 (13-14) : 8549 - 8559
[4] Meta-reinforcement learning for the tuning of PI controllers: An offline approach
McClement, Daniel G.
Lawrence, Nathan P.
Backstroem, Johan U.
Loewen, Philip D.
Forbes, Michael G.
Gopaluni, R. Bhushan
JOURNAL OF PROCESS CONTROL, 2022, 118 : 139 - 152
[5] Automatic Hyperparameter Tuning in Deep Convolutional Neural Networks Using Asynchronous Reinforcement Learning
Neary, Patrick L.
2018 IEEE INTERNATIONAL CONFERENCE ON COGNITIVE COMPUTING (ICCC), 2018, : 73 - 77
[6] Hyperparameter Tuning of an Off-Policy Reinforcement Learning Algorithm for H∞ Tracking Control
Farahmandi, Alireza
Reitz, Brian
Debord, Mark
Philbrick, Douglas
Estabridis, Katia
Hewer, Gary
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
[7] Batch Reinforcement Learning with Hyperparameter Gradients
Lee, Byung-Jun
Lee, Jongmin
Vrancx, Peter
Kim, Dongho
Kim, Kee-Eung
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[8] Offline Reinforcement Learning with Pseudometric Learning
Dadashi, Robert
Rezaeifar, Shideh
Vieillard, Nino
Hussenot, Leonard
Pietquin, Olivier
Geist, Matthieu
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[9] Benchmarking Offline Reinforcement Learning
Tittaferrante, Andrew
Yassine, Abdulsalam
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 259 - 263
[10] Federated Offline Reinforcement Learning
Zhou, Doudou
Zhang, Yufeng
Sonabend-W, Aaron
Wang, Zhaoran
Lu, Junwei
Cai, Tianxi
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024,

← 1 2 3 4 5 →