NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning

被引：0

作者：

Qin, Rong-Jun ^{[1
,2
]}

Zhang, Xingyuan ^{[2
]}

Gao, Songyi ^{[2
]}

Chen, Xiong-Hui ^{[1
,2
]}

Li, Zewen ^{[2
]}

Zhang, Weinan ^{[3
]}

Yu, Yang ^{[1
,2
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China

[2] Polixir Technol, Nanjing, Peoples R China

[3] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline reinforcement learning (RL) aims at learning effective policies from historical data without extra environment interactions. During our experience of applying offline RL, we noticed that previous offline RL benchmarks commonly involve significant reality gaps, which we have identified include rich and overly exploratory datasets, degraded baseline, and missing policy validation. In many real-world situations, to ensure system safety, running an overly exploratory policy to collect various data is prohibited, thus only a narrow data distribution is available. The resulting policy is regarded as effective if it is better than the working behavior policy; the policy model can be deployed only if it has been well validated, rather than accomplished the training. In this paper, we present a Near real-world offline RL benchmark, named NeoRL, to reflect these properties. NeoRL datasets are collected with a more conservative strategy. Moreover, NeoRL contains the offline training and offline validation pipeline before the online test, corresponding to real-world situations. We then evaluate recent state-of-the-art offline RL algorithms on NeoRL. The empirical results demonstrate that some offline RL algorithms are less competitive to the behavior cloning and the deterministic behavior policy, implying that they may be less effective in real-world tasks than in the previous benchmarks. We also disclose that current offline policy evaluation methods could hardly select the best policy. We hope this work will shed some light on future research and deploying RL in real-world systems.

引用

页数：13

共 50 条

[21] Real-world Robot Reaching Skill Learning Based on Deep Reinforcement Learning
Liu, Naijun
Lu, Tao
Cai, Yinghao
Wang, Rui
Wang, Shuo
[J]. PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 4780 - 4784
[22] A Real-World Benchmark Problem for Global Optimization
Yuriy, Romasevych
Viatcheslav, Loveikin
Borys, Bakay
[J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2023, 23 (03) : 23 - 39
[23] Real-world dexterous object manipulation based deep reinforcement learning
Yao, Qingfeng
Wang, Jilong
Yang, Shuyu
[J]. arXiv, 2021,
[24] Towards Real-World Deployment of Reinforcement Learning for Traffic Signal Control
Mueller, Arthur
Rangras, Vishal
Ferfers, Tobias
Hufen, Florian
Schreckenberg, Lukas
Jasperneite, Juergen
Schnittker, Georg
Waldmann, Michael
Friesen, Maxim
Wiering, Marco
[J]. 20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 507 - 514
[25] Tackling Real-World Autonomous Driving using Deep Reinforcement Learning
Maramotti, Paolo
Capasso, Alessandro Paolo
Bacchiani, Giulio
Broggi, Alberto
[J]. 2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, : 1274 - 1281
[26] Simulation-Based Reinforcement Learning for Real-World Autonomous Driving
Osinski, Blazej
Jakubowski, Adam
Ziecina, Pawel
Milos, Piotr
Galias, Christopher
Homoceanu, Silviu
Michalewski, Henryk
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 6411 - 6418
[27] Real-Sim-Real Transfer for Real-World Robot Control Policy Learning with Deep Reinforcement Learning
Liu, Naijun
Cai, Yinghao
Lu, Tao
Wang, Rui
Wang, Shuo
[J]. APPLIED SCIENCES-BASEL, 2020, 10 (05):
[28] Train Offline, Test Online: A Real Robot Learning Benchmark
Zhou, Gaoyue
Dean, Victoria
Srirama, Mohan Kumar
Rajeswaran, Aravind
Pari, Jyothish
Hatch, Kyle
Jain, Aryan
Yu, Tianhe
Abbeel, Pieter
Pinto, Lend
Finn, Chelsea
Gupta, Abhinav
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 9197 - 9203
[29] Intrinsically motivated reinforcement learning for human-robot interaction in the real-world
Qureshi, Ahmed Hussain
Nakamura, Yutaka
Yoshikawa, Yuichiro
Ishiguro, Hiroshi
[J]. NEURAL NETWORKS, 2018, 107 : 23 - 33
[30] Non-blocking Asynchronous Training for Reinforcement Learning in Real-World Environments
Bohm, Peter
Pounds, Pauline
Chapman, Archie C.
[J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 10927 - 10934

← 1 2 3 4 5 →