A DATASET PERSPECTIVE ON OFFLINE REINFORCEMENT LEARNING

被引:0
|
作者
Schweighofer, Kajetan [1 ,2 ]
Radler, Andreas [1 ,2 ]
Dinu, Marius-Constantin [1 ,2 ,4 ]
Hofmarcher, Markus [1 ,2 ]
Patil, Vihang [1 ,2 ]
Bitto-Nemling, Angela [1 ,2 ,3 ]
Eghbal-zadeh, Hamid [1 ,2 ,3 ]
Hochreiter, Sepp [1 ,2 ]
机构
[1] Johannes Kepler Univ Linz, ELLIS Unit Linz, Inst Machine Learning, Linz, Austria
[2] Johannes Kepler Univ Linz, Inst Machine Learning, LIT AI Lab, Linz, Austria
[3] IARAI, Vienna, Austria
[4] Dynatrace Res, Linz, Austria
基金
欧盟地平线“2020”;
关键词
CONCEPT DRIFT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The application of Reinforcement Learning (RL) in real world environments can be expensive or risky due to sub-optimal policies during training. In Offline RL, this problem is avoided since interactions with an environment are prohibited. Policies are learned from a given dataset, which solely determines their performance. Despite this fact, how dataset characteristics influence Offline RL algorithms is still hardly investigated. The dataset characteristics are determined by the behavioral policy that samples this dataset. Therefore, we define characteristics of behavioral policies as exploratory for yielding high expected information in their interaction with the Markov Decision Process (MDP) and as exploitative for having high expected return. We implement two corresponding empirical measures for the datasets sampled by the behavioral policy in deterministic MDPs. The first empirical measure SACo is defined by the normalized unique state-action pairs and captures exploration. The second empirical measure TQ is defined by the normalized average trajectory return and captures exploitation. Empirical evaluations show the effectiveness of TQ and SACo. In large-scale experiments using our proposed measures, we show that the unconstrained off-policy Deep Q-Network family requires datasets with high SACo to find a good policy. Furthermore, experiments show that policy constraint algorithms perform well on datasets with high TQ and SACo. Finally, the experiments show, that purely dataset-constrained Behavioral Cloning performs competitively to the best Offline RL algorithms for datasets with high TQ. [GRAPHICS] .
引用
收藏
页数:48
相关论文
共 50 条
  • [21] Offline Reinforcement Learning for Visual Navigation
    Shah, Dhruv
    Bhorkar, Arjun
    Leen, Hrish
    Kostrikov, Ilya
    Rhinehart, Nick
    Levine, Sergey
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 44 - 54
  • [22] Hyperparameter Tuning in Offline Reinforcement Learning
    Tittaferrante, Andrew
    Yassine, Abdulsalam
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 585 - 590
  • [23] Offline reinforcement learning with task hierarchies
    Devin Schwab
    Soumya Ray
    Machine Learning, 2017, 106 : 1569 - 1598
  • [24] Offline Reinforcement Learning at Multiple Frequencies
    Burns, Kaylee
    Yu, Tianhe
    Finn, Chelsea
    Hausman, Karol
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 2041 - 2051
  • [25] Survival Instinct in Offline Reinforcement Learning
    Li, Anqi
    Misra, Dipendra
    Kolobov, Andrey
    Cheng, Ching-An
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [26] Offline Reinforcement Learning for Mobile Notifications
    Yuan, Yiping
    Muralidharan, Ajith
    Nandy, Preetam
    Cheng, Miao
    Prabhakar, Prakruthi
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3614 - 3623
  • [27] The Impact of Dataset on Offline Reinforcement Learning Performance in UAV-Based Emergency Network Recovery Tasks
    Eo, Jeyeon
    Lee, Dongsu
    Kwon, Minhae
    IEEE COMMUNICATIONS LETTERS, 2024, 28 (05) : 1058 - 1061
  • [28] Learning to Influence Human Behavior with Offline Reinforcement Learning
    Hong, Joey
    Levine, Sergey
    Dragan, Anca
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [29] A Review of Offline Reinforcement Learning Based on Representation Learning
    Wang X.-S.
    Wang R.-R.
    Cheng Y.-H.
    Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (06): : 1104 - 1128
  • [30] Discrete Uncertainty Quantification For Offline Reinforcement Learning
    Perez, Jose Luis
    Corrochano, Javier
    Garcia, Javier
    Majadas, Ruben
    Ibanez-Llano, Cristina
    Perez, Sergio
    Fernandez, Fernando
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2023, 13 (04) : 273 - 287