Hyperparameter Tuning in Offline Reinforcement Learning

被引:0
|
作者
Tittaferrante, Andrew [1 ]
Yassine, Abdulsalam [2 ]
机构
[1] Lakehead Univ, Elect & Comp Engn, Thunder Bay, ON, Canada
[2] Lakehead Univ, Software Engn, Thunder Bay, ON, Canada
关键词
Deep Learning; Reinforcement Learning; Offline Reinforcement Learning;
D O I
10.1109/ICMLA55696.2022.00101
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we propose a reliable hyper-parameter tuning scheme for offline reinforcement learning. We demonstrate our proposed scheme using the simplest antmaze environment from the standard benchmark offline dataset, D4RL. The usual approach for policy evaluation in offline reinforcement learning involves online evaluation, i.e., cherry-picking best performance on the test environment. To mitigate this cherry-picking, we propose an ad-hoc online evaluation metric, which we name "median-median-return". This metric enables more reliable reporting of results because it represents the expected performance of the learned policy by taking the median online evaluation performance across both epochs and training runs. To demonstrate our scheme, we employ the recently state-of-the-art algorithm, IQL, and perform a thorough hyperparameter search based on our proposed metric. The tuned architectures enjoy notably stronger cherry-picked performance, and the best models are able to surpass the reported state-of-the-art performance on average.
引用
收藏
页码:585 / 590
页数:6
相关论文
共 50 条
  • [41] Fast Rates for the Regret of Offline Reinforcement Learning
    Hu, Yichun
    Kallus, Nathan
    Uehara, Masatoshi
    MATHEMATICS OF OPERATIONS RESEARCH, 2024,
  • [42] Mutual Information Regularized Offline Reinforcement Learning
    Ma, Xiao
    Kang, Bingyi
    Xu, Zhongwen
    Lin, Min
    Yan, Shuicheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [43] Revisiting the Minimalist Approach to Offline Reinforcement Learning
    Tarasov, Denis
    Kurenkov, Vladislav
    Nikulin, Alexander
    Kolesnikov, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [44] Bellman Residual Orthogonalization for Offline Reinforcement Learning
    Zanette, Andrea
    Wainwright, Martin J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [45] Discrete Uncertainty Quantification For Offline Reinforcement Learning
    Perez, Jose Luis
    Corrochano, Javier
    Garcia, Javier
    Majadas, Ruben
    Ibanez-Llano, Cristina
    Perez, Sergio
    Fernandez, Fernando
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2023, 13 (04) : 273 - 287
  • [46] Supported Value Regularization for Offline Reinforcement Learning
    Mao, Yixiu
    Zhang, Hongchang
    Chen, Chen
    Xu, Yi
    Ji, Xiangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [47] Supported Policy Optimization for Offline Reinforcement Learning
    Wu, Jialong
    Wu, Haixu
    Qiu, Zihan
    Wang, Jianmin
    Long, Mingsheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [48] Offline Reinforcement Learning for Automated Stock Trading
    Lee, Namyeong
    Moon, Jun
    IEEE ACCESS, 2023, 11 : 112577 - 112589
  • [49] On the Role of Discount Factor in Offline Reinforcement Learning
    Hu, Hao
    Yang, Yiqing
    Zhao, Qianchuan
    Zhang, Chongjie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [50] Offline Evaluation of Online Reinforcement Learning Algorithms
    Mandel, Travis
    Liu, Yun-En
    Brunskill, Emma
    Popovic, Zoran
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1926 - 1933