NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning

被引:0
|
作者
Qin, Rong-Jun [1 ,2 ]
Zhang, Xingyuan [2 ]
Gao, Songyi [2 ]
Chen, Xiong-Hui [1 ,2 ]
Li, Zewen [2 ]
Zhang, Weinan [3 ]
Yu, Yang [1 ,2 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Polixir Technol, Nanjing, Peoples R China
[3] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) aims at learning effective policies from historical data without extra environment interactions. During our experience of applying offline RL, we noticed that previous offline RL benchmarks commonly involve significant reality gaps, which we have identified include rich and overly exploratory datasets, degraded baseline, and missing policy validation. In many real-world situations, to ensure system safety, running an overly exploratory policy to collect various data is prohibited, thus only a narrow data distribution is available. The resulting policy is regarded as effective if it is better than the working behavior policy; the policy model can be deployed only if it has been well validated, rather than accomplished the training. In this paper, we present a Near real-world offline RL benchmark, named NeoRL, to reflect these properties. NeoRL datasets are collected with a more conservative strategy. Moreover, NeoRL contains the offline training and offline validation pipeline before the online test, corresponding to real-world situations. We then evaluate recent state-of-the-art offline RL algorithms on NeoRL. The empirical results demonstrate that some offline RL algorithms are less competitive to the behavior cloning and the deterministic behavior policy, implying that they may be less effective in real-world tasks than in the previous benchmarks. We also disclose that current offline policy evaluation methods could hardly select the best policy. We hope this work will shed some light on future research and deploying RL in real-world systems.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Deep reinforcement learning towards real-world dynamic thermal management of data centers
    Zhang, Qingang
    Zeng, Wei
    Lin, Qinjie
    Chng, Chin-Boon
    Chui, Chee-Kong
    Lee, Poh-Seng
    [J]. APPLIED ENERGY, 2023, 333
  • [42] Adaptive internal state space construction method for Reinforcement learning of a real-world agent
    Samejima, K
    Omori, T
    [J]. NEURAL NETWORKS, 1999, 12 (7-8) : 1143 - 1155
  • [43] Application of Reinforcement Learning with Continuous State Space to Ramp Metering in Real-world Conditions
    Rezaee, Kasra
    Abdulhai, Baher
    Abdelgawad, Hossam
    [J]. 2012 15TH INTERNATIONAL IEEE CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2012, : 1590 - 1595
  • [44] Virtual-Taobao: Virtualizing Real-World Online Retail Environment for Reinforcement Learning
    Shi, Jing-Cheng
    Yu, Yang
    Da, Qing
    Chen, Shi-Yong
    Zeng, An-Xiang
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4902 - 4909
  • [45] First steps towards real-world traffic signal control optimisation by reinforcement learning
    Meess, Henri
    Gerner, Jeremias
    Hein, Daniel
    Schmidtner, Stefanie
    Elger, Gordon
    Bogenberger, Klaus
    [J]. JOURNAL OF SIMULATION, 2024,
  • [46] Real-world ride-hailing vehicle repositioning using deep reinforcement learning
    Jiao, Yan
    Tang, Xiaocheng
    Qin, Zhiwei
    Li, Shuaiji
    Zhang, Fan
    Zhu, Hongtu
    Ye, Jieping
    [J]. TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2021, 130
  • [47] Reinforcement Learning for Semi-Active Vertical Dynamics Control with Real-World Tests
    Ultsch, Johannes
    Pfeiffer, Andreas
    Ruggaber, Julian
    Kamp, Tobias
    Brembeck, Jonathan
    Tobolar, Jakub
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (16):
  • [48] Real-world ride-hailing vehicle repositioning using deep reinforcement learning
    Jiao, Yan
    Tang, Xiaocheng
    Qin, Zhiwei
    Li, Shuaiji
    Zhang, Fan
    Zhu, Hongtu
    Ye, Jieping
    [J]. Transportation Research Part C: Emerging Technologies, 2021, 130
  • [49] Towards Distributed Communication and Control in Real-World Multi-Agent Reinforcement Learning
    Liu, Jieyan
    Liu, Yi
    Du, Zhekai
    Lu, Ke
    [J]. IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 4974 - 4979
  • [50] GoBench: A Benchmark Suite of Real-World Go Concurrency Bugs
    Yuan, Ting
    Li, Guangwei
    Lu, Jie
    Liu, Chen
    Li, Lian
    Xue, Jingling
    [J]. CGO '21: PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2021, : 187 - 199