Hyperparameter Tuning in Offline Reinforcement Learning

被引:0
|
作者
Tittaferrante, Andrew [1 ]
Yassine, Abdulsalam [2 ]
机构
[1] Lakehead Univ, Elect & Comp Engn, Thunder Bay, ON, Canada
[2] Lakehead Univ, Software Engn, Thunder Bay, ON, Canada
关键词
Deep Learning; Reinforcement Learning; Offline Reinforcement Learning;
D O I
10.1109/ICMLA55696.2022.00101
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we propose a reliable hyper-parameter tuning scheme for offline reinforcement learning. We demonstrate our proposed scheme using the simplest antmaze environment from the standard benchmark offline dataset, D4RL. The usual approach for policy evaluation in offline reinforcement learning involves online evaluation, i.e., cherry-picking best performance on the test environment. To mitigate this cherry-picking, we propose an ad-hoc online evaluation metric, which we name "median-median-return". This metric enables more reliable reporting of results because it represents the expected performance of the learned policy by taking the median online evaluation performance across both epochs and training runs. To demonstrate our scheme, we employ the recently state-of-the-art algorithm, IQL, and perform a thorough hyperparameter search based on our proposed metric. The tuned architectures enjoy notably stronger cherry-picked performance, and the best models are able to surpass the reported state-of-the-art performance on average.
引用
收藏
页码:585 / 590
页数:6
相关论文
共 50 条
  • [31] Exploring Hyperparameter Usage and Tuning in Machine Learning Research
    Simon, Sebastian
    Kolyada, Nikolay
    Akiki, Christopher
    Potthast, Martin
    Stein, Benno
    Siegmund, Norbert
    2023 IEEE/ACM 2ND INTERNATIONAL CONFERENCE ON AI ENGINEERING - SOFTWARE ENGINEERING FOR AI, CAIN, 2023, : 68 - 79
  • [32] Efficient Online Hyperparameter Adaptation for Deep Reinforcement Learning
    Zhou, Yinda
    Liu, Weiming
    Li, Bin
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2019, 2019, 11454 : 141 - 155
  • [33] Enabling Hyperparameter Tuning of Machine Learning Classifiers in Production
    Sandha, Sandeep Singh
    Aggarwal, Mohit
    Saha, Swapnil Sayan
    Srivastava, Mani
    2021 IEEE THIRD INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2021), 2021, : 262 - 271
  • [34] Efficient Deep Learning Hyperparameter Tuning using Cloud Infrastructure Intelligent Distributed Hyperparameter tuning with Bayesian Optimization in the Cloud
    Ranjit, Mercy Prasanna
    Ganapathy, Gopinath
    Sridhar, Kalaivani
    Arumugham, Vikram
    2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019), 2019, : 520 - 522
  • [35] Learning to Influence Human Behavior with Offline Reinforcement Learning
    Hong, Joey
    Levine, Sergey
    Dragan, Anca
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [36] A Review of Offline Reinforcement Learning Based on Representation Learning
    Wang X.-S.
    Wang R.-R.
    Cheng Y.-H.
    Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (06): : 1104 - 1128
  • [37] Offline Policy Iteration Based Reinforcement Learning Controller for Online Robotic Knee Prosthesis Parameter Tuning
    Li, Minhan
    Gao, Xiang
    Wen, Yue
    Si, Jennie
    Huang, He
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 2831 - 2837
  • [38] Deadly triad matters for offline reinforcement learning
    Peng, Zhiyong
    Liu, Yadong
    Zhou, Zongtan
    KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [39] Robust Reinforcement Learning using Offline Data
    Panaganti, Kishan
    Xu, Zaiyan
    Kalathil, Dileep
    Ghavamzadeh, Mohammad
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [40] Boundary Data Augmentation for Offline Reinforcement Learning
    SHEN Jiahao
    JIANG Ke
    TAN Xiaoyang
    ZTE Communications, 2023, 21 (03) : 29 - 36