Expert Initialized Hybrid Model-Based and Model-Free Reinforcement Learning

被引：0

作者：

Langaa, Jeppe ^{[1
]}

Sloth, Christoffer ^{[1
]}

机构：

[1] Univ Southern Denmark, Maersk McKinney Moller Inst, Odense, Denmark

来源：

2023 EUROPEAN CONTROL CONFERENCE, ECC | 2023年

关键词：

D O I：

10.23919/ECC57647.2023.10178306

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a reinforcement learning algorithm that enables fast learning of control policies based on a limited amount of training data, by leveraging the attributes of both model-based and model-free algorithms. This is accomplished by using expert demonstrations for initializing the reinforcement learning algorithm, by learning a Gaussian process model and a policy that behaves similar to the expert. The policy is subsequently improved using Bi-poplation Covariance Matrix Adaptation Evolution Strategy (BIPOP-CMA-ES) that exploits the model in a black-box optimizer. Finally, the policy parameters obtained from BIPOP-CMA-ES are refined by a model-free reinforcement learning algorithm. Scalable Variational Gaussian Processes are used in the model to allow high-dimensional state spaces and larger amounts of data; in addition, autoencoders are used for dimensionality reduction of the parameter space in BIPOP-CMA-ES. The algorithm is tested in a cart-pole system as well in a higher-dimensional industrial peg-in-hole task and is compared to state-of-the-art model-free and model-based algorithms. The proposed algorithm solves the peg-in-hole task faster than previous algorithms.

引用

页数：6

共 50 条

[1] Hybrid control for combining model-based and model-free reinforcement learning
Pinosky, Allison
Abraham, Ian
Broad, Alexander
Argall, Brenna
Murphey, Todd D.
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2023, 42 (06): : 337 - 355
[2] Model-based and Model-free Reinforcement Learning for Visual Servoing
Farahmand, Amir Massoud
Shademan, Azad
Jagersand, Martin
Szepesvari, Csaba
[J]. ICRA: 2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-7, 2009, : 4135 - 4142
[3] Model-Based and Model-Free Replay Mechanisms for Reinforcement Learning in Neurorobotics
Massi, Elisa
Barthelemy, Jeanne
Mailly, Juliane
Dromnelle, Remi
Canitrot, Julien
Poniatowski, Esther
Girard, Benoit
Khamassi, Mehdi
[J]. FRONTIERS IN NEUROROBOTICS, 2022, 16
[4] Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
Swazinna, Phillip
Udluft, Steffen
Hein, Daniel
Runkler, Thomas
[J]. IFAC PAPERSONLINE, 2022, 55 (15): : 19 - 26
[5] Sliding mode heading control for AUV based on continuous hybrid model-free and model-based reinforcement learning
Wang, Dianrui
Shen, Yue
Wan, Junhe
Sha, Qixin
Li, Guangliang
Chen, Guanzhong
He, Bo
[J]. APPLIED OCEAN RESEARCH, 2022, 118
[6] Parallel model-based and model-free reinforcement learning for card sorting performance
Steinke, Alexander
Lange, Florian
Kopp, Bruno
[J]. SCIENTIFIC REPORTS, 2020, 10 (01)
[7] EEG-based classification of learning strategies : model-based and model-free reinforcement learning
Kim, Dongjae
Weston, Charles
Lee, Sang Wan
[J]. 2018 6TH INTERNATIONAL CONFERENCE ON BRAIN-COMPUTER INTERFACE (BCI), 2018, : 146 - 148
[8] Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning
Lehnert, Lucas
Littman, Michael L.
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
[9] Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning
Doll, Bradley B.
Bath, Kevin G.
Daw, Nathaniel D.
Frank, Michael J.
[J]. JOURNAL OF NEUROSCIENCE, 2016, 36 (04): : 1211 - 1222
[10] Successor features combine elements of model-free and model-based reinforcement learning
Lehnert, Lucas
Littman, Michael L.
[J]. Journal of Machine Learning Research, 2020, 21

← 1 2 3 4 5 →