NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning

被引:0
|
作者
Qin, Rong-Jun [1 ,2 ]
Zhang, Xingyuan [2 ]
Gao, Songyi [2 ]
Chen, Xiong-Hui [1 ,2 ]
Li, Zewen [2 ]
Zhang, Weinan [3 ]
Yu, Yang [1 ,2 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Polixir Technol, Nanjing, Peoples R China
[3] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) aims at learning effective policies from historical data without extra environment interactions. During our experience of applying offline RL, we noticed that previous offline RL benchmarks commonly involve significant reality gaps, which we have identified include rich and overly exploratory datasets, degraded baseline, and missing policy validation. In many real-world situations, to ensure system safety, running an overly exploratory policy to collect various data is prohibited, thus only a narrow data distribution is available. The resulting policy is regarded as effective if it is better than the working behavior policy; the policy model can be deployed only if it has been well validated, rather than accomplished the training. In this paper, we present a Near real-world offline RL benchmark, named NeoRL, to reflect these properties. NeoRL datasets are collected with a more conservative strategy. Moreover, NeoRL contains the offline training and offline validation pipeline before the online test, corresponding to real-world situations. We then evaluate recent state-of-the-art offline RL algorithms on NeoRL. The empirical results demonstrate that some offline RL algorithms are less competitive to the behavior cloning and the deterministic behavior policy, implying that they may be less effective in real-world tasks than in the previous benchmarks. We also disclose that current offline policy evaluation methods could hardly select the best policy. We hope this work will shed some light on future research and deploying RL in real-world systems.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Offline Learning of Counterfactual Predictions for Real-World Robotic Reinforcement Learning
    Jin, Jun
    Graves, Daniel
    Haigh, Cameron
    Luo, Jun
    Jagersand, Martin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 3616 - 3623
  • [2] ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource Allocation
    Pendyala, Abhijeet
    Dettmer, Justin
    Glasmachers, Tobias
    Atamna, Asma
    [J]. MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2023, PT I, 2024, 14505 : 78 - 92
  • [3] Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications
    Nambiar, Mila
    Ghosh, Supriyo
    Ong, Priscilla
    Chan, Yu En
    Bee, Yong Mong
    Krishnaswamy, Pavitra
    [J]. PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 4673 - 4684
  • [4] Differentiable Physics Models for Real-world Offline Model-based Reinforcement Learning
    Lutter, Michael
    Silberbauer, Johannes
    Watson, Joe
    Peters, Jan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 4163 - 4170
  • [5] Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks
    Kreutzer, Julia
    Riezler, Stefan
    Lawrence, Carolin
    [J]. SPNLP 2021: THE 5TH WORKSHOP ON STRUCTURED PREDICTION FOR NLP, 2021, : 37 - 43
  • [6] Real World Offline Reinforcement Learning with Realistic Data Source
    Zhou, Gaoyue
    Ke, Liyiming
    Srinivasa, Siddhartha
    Gupta, Abhinav
    Rajeswaran, Aravind
    Kumar, Vikash
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 7176 - 7183
  • [7] Real-world humanoid locomotion with reinforcement learning
    Radosavovic, Ilija
    Xiao, Tete
    Zhang, Bike
    Darrell, Trevor
    Malik, Jitendra
    Sreenath, Koushil
    [J]. SCIENCE ROBOTICS, 2024, 9 (89)
  • [8] Learning to Drive (L2D) as a Low-Cost Benchmark for Real-World Reinforcement Learning
    Viitala, Art
    Boney, Rinu
    Zhao, Yi
    Ilin, Alexander
    Kannala, Juho
    [J]. 2021 20TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), 2021, : 275 - 281
  • [9] Offline Reinforcement Learning for Autonomous Driving with Real World Driving Data
    Fang, Xing
    Zhang, Qichao
    Gao, Yinfeng
    Zhao, Dongbin
    [J]. 2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 3417 - 3422
  • [10] Learning real-world heterogeneous noise models with a benchmark dataset
    Sun, Lu
    Lin, Jie
    Dong, Weisheng
    Li, Xin
    Wu, Jinjian
    Shi, Guangming
    [J]. PATTERN RECOGNITION, 2024, 156