Hyperparameter Tuning in Offline Reinforcement Learning

被引：0

作者：

Tittaferrante, Andrew ^{[1
]}

Yassine, Abdulsalam ^{[2
]}

机构：

[1] Lakehead Univ, Elect & Comp Engn, Thunder Bay, ON, Canada

[2] Lakehead Univ, Software Engn, Thunder Bay, ON, Canada

来源：

2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA | 2022年

关键词：

Deep Learning; Reinforcement Learning; Offline Reinforcement Learning;

D O I：

10.1109/ICMLA55696.2022.00101

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we propose a reliable hyper-parameter tuning scheme for offline reinforcement learning. We demonstrate our proposed scheme using the simplest antmaze environment from the standard benchmark offline dataset, D4RL. The usual approach for policy evaluation in offline reinforcement learning involves online evaluation, i.e., cherry-picking best performance on the test environment. To mitigate this cherry-picking, we propose an ad-hoc online evaluation metric, which we name "median-median-return". This metric enables more reliable reporting of results because it represents the expected performance of the learned policy by taking the median online evaluation performance across both epochs and training runs. To demonstrate our scheme, we employ the recently state-of-the-art algorithm, IQL, and perform a thorough hyperparameter search based on our proposed metric. The tuned architectures enjoy notably stronger cherry-picked performance, and the best models are able to surpass the reported state-of-the-art performance on average.

引用

页码：585 / 590

页数：6

共 50 条

[31] Exploring Hyperparameter Usage and Tuning in Machine Learning Research
Simon, Sebastian
Kolyada, Nikolay
Akiki, Christopher
Potthast, Martin
Stein, Benno
Siegmund, Norbert
2023 IEEE/ACM 2ND INTERNATIONAL CONFERENCE ON AI ENGINEERING - SOFTWARE ENGINEERING FOR AI, CAIN, 2023, : 68 - 79
[32] Efficient Online Hyperparameter Adaptation for Deep Reinforcement Learning
Zhou, Yinda
Liu, Weiming
Li, Bin
APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2019, 2019, 11454 : 141 - 155
[33] Enabling Hyperparameter Tuning of Machine Learning Classifiers in Production
Sandha, Sandeep Singh
Aggarwal, Mohit
Saha, Swapnil Sayan
Srivastava, Mani
2021 IEEE THIRD INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2021), 2021, : 262 - 271
[34] Efficient Deep Learning Hyperparameter Tuning using Cloud Infrastructure Intelligent Distributed Hyperparameter tuning with Bayesian Optimization in the Cloud
Ranjit, Mercy Prasanna
Ganapathy, Gopinath
Sridhar, Kalaivani
Arumugham, Vikram
2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019), 2019, : 520 - 522
[35] Learning to Influence Human Behavior with Offline Reinforcement Learning
Hong, Joey
Levine, Sergey
Dragan, Anca
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[36] A Review of Offline Reinforcement Learning Based on Representation Learning
Wang X.-S.
Wang R.-R.
Cheng Y.-H.
Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (06): : 1104 - 1128
[37] Offline Policy Iteration Based Reinforcement Learning Controller for Online Robotic Knee Prosthesis Parameter Tuning
Li, Minhan
Gao, Xiang
Wen, Yue
Si, Jennie
Huang, He
2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 2831 - 2837
[38] Deadly triad matters for offline reinforcement learning
Peng, Zhiyong
Liu, Yadong
Zhou, Zongtan
KNOWLEDGE-BASED SYSTEMS, 2024, 284
[39] Robust Reinforcement Learning using Offline Data
Panaganti, Kishan
Xu, Zaiyan
Kalathil, Dileep
Ghavamzadeh, Mohammad
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[40] Boundary Data Augmentation for Offline Reinforcement Learning
SHEN Jiahao
JIANG Ke
TAN Xiaoyang
ZTE Communications, 2023, 21 (03) : 29 - 36

← 1 2 3 4 5 →