Discrete-to-deep reinforcement learning methods

被引：1

作者：

Kurniawan, Budi ^{[1
]}

Vamplew, Peter ^{[1
]}

Papasimeon, Michael ^{[2
]}

Dazeley, Richard ^{[3
]}

Foale, Cameron ^{[1
]}

机构：

[1] Federat Univ, Mt Helen, Vic 3350, Australia

[2] Def Sci & Technol Grp, Fishermans Bend, Vic 3207, Australia

[3] Deakin Univ, Sch Informat Technol, Geelong, Vic 3220, Australia

来源：

NEURAL COMPUTING & APPLICATIONS | 2022年 / 34卷 / 03期

关键词：

Reinforcement learning; Neural network; Actor-critic; Supervised learning; DQN;

D O I：

10.1007/s00521-021-06270-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. In complex problems, a neural RL approach is often able to learn a better solution than tabular RL, but generally takes longer. This paper proposes two methods, Discrete-to-Deep Supervised Policy Learning (D2D-SPL) and Discrete-to-Deep Supervised Q-value Learning (D2D-SQL), whose objective is to acquire the generalisability of a neural network at a cost nearer to that of a tabular method. Both methods combine RL and supervised learning (SL) and are based on the idea that a fast-learning tabular method can generate off-policy data to accelerate learning in neural RL. D2D-SPL uses the data to train a classifier which is then used as a controller for the RL problem. D2D-SQL uses the data to initialise a neural network which is then allowed to continue learning using another RL method. We demonstrate the viability of our algorithms with Cartpole, Lunar Lander and an aircraft manoeuvring problem, three continuous-space environments with low-dimensional state variables. Both methods learn at least 38% faster than baseline methods and yield policies that outperform them.

引用

页码：1713 / 1733

页数：21

共 50 条

[1] Discrete-to-deep reinforcement learning methods
Budi Kurniawan
Peter Vamplew
Michael Papasimeon
Richard Dazeley
Cameron Foale
[J]. Neural Computing and Applications, 2022, 34 : 1713 - 1733
[2] Contrastive Learning Methods for Deep Reinforcement Learning
Wang, Di
Hu, Mengqi
[J]. IEEE ACCESS, 2023, 11 : 97107 - 97117
[3] Asynchronous Methods for Deep Reinforcement Learning
Mnih, Volodymyr
Badia, Adria Puigdomenech
Mirza, Mehdi
Graves, Alex
Harley, Tim
Lillicrap, Timothy P.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[4] Generalized Representation Learning Methods for Deep Reinforcement Learning
Zhu, Hanhua
[J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 5216 - 5217
[5] DEEP REINFORCEMENT LEARNING IN LINEAR DISCRETE ACTION SPACES
van Heeswijk, Wouter
La Poutre, Han
[J]. 2020 WINTER SIMULATION CONFERENCE (WSC), 2020, : 1063 - 1074
[6] Decomposition methods with deep corrections for reinforcement learning
Maxime Bouton
Kyle D. Julian
Alireza Nakhaei
Kikuo Fujimura
Mykel J. Kochenderfer
[J]. Autonomous Agents and Multi-Agent Systems, 2019, 33 : 330 - 352
[7] Decomposition methods with deep corrections for reinforcement learning
Bouton, Maxime
Julian, Kyle D.
Nakhaei, Alireza
Fujimura, Kikuo
Kochenderfer, Mykel J.
[J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2019, 33 (03) : 330 - 352
[8] Benchmarking Deep and Non-deep Reinforcement Learning Algorithms for Discrete Environments
Duarte, Fernando F.
Lau, Nuno
Pereira, Artur
Reis, Luis P.
[J]. FOURTH IBERIAN ROBOTICS CONFERENCE: ADVANCES IN ROBOTICS, ROBOT 2019, VOL 2, 2020, 1093 : 263 - 275
[9] Survey of Deep Reinforcement Learning Methods with Evolutionary Algorithms
Lü S.
Gong X.-Y.
Zhang Z.-H.
Han S.
Zhang J.-W.
[J]. Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (07): : 1478 - 1499
[10] Comparison of multiple reinforcement learning and deep reinforcement learning methods for the task aimed at achieving the goal
Parak R.
Matousek R.
[J]. Mendel, 2021, 27 (01) : 1 - 8

← 1 2 3 4 5 →