Discrete-to-deep reinforcement learning methods

被引:1
|
作者
Kurniawan, Budi [1 ]
Vamplew, Peter [1 ]
Papasimeon, Michael [2 ]
Dazeley, Richard [3 ]
Foale, Cameron [1 ]
机构
[1] Federat Univ, Mt Helen, Vic 3350, Australia
[2] Def Sci & Technol Grp, Fishermans Bend, Vic 3207, Australia
[3] Deakin Univ, Sch Informat Technol, Geelong, Vic 3220, Australia
来源
NEURAL COMPUTING & APPLICATIONS | 2022年 / 34卷 / 03期
关键词
Reinforcement learning; Neural network; Actor-critic; Supervised learning; DQN;
D O I
10.1007/s00521-021-06270-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. In complex problems, a neural RL approach is often able to learn a better solution than tabular RL, but generally takes longer. This paper proposes two methods, Discrete-to-Deep Supervised Policy Learning (D2D-SPL) and Discrete-to-Deep Supervised Q-value Learning (D2D-SQL), whose objective is to acquire the generalisability of a neural network at a cost nearer to that of a tabular method. Both methods combine RL and supervised learning (SL) and are based on the idea that a fast-learning tabular method can generate off-policy data to accelerate learning in neural RL. D2D-SPL uses the data to train a classifier which is then used as a controller for the RL problem. D2D-SQL uses the data to initialise a neural network which is then allowed to continue learning using another RL method. We demonstrate the viability of our algorithms with Cartpole, Lunar Lander and an aircraft manoeuvring problem, three continuous-space environments with low-dimensional state variables. Both methods learn at least 38% faster than baseline methods and yield policies that outperform them.
引用
收藏
页码:1713 / 1733
页数:21
相关论文
共 50 条
  • [1] Discrete-to-deep reinforcement learning methods
    Budi Kurniawan
    Peter Vamplew
    Michael Papasimeon
    Richard Dazeley
    Cameron Foale
    [J]. Neural Computing and Applications, 2022, 34 : 1713 - 1733
  • [2] Contrastive Learning Methods for Deep Reinforcement Learning
    Wang, Di
    Hu, Mengqi
    [J]. IEEE ACCESS, 2023, 11 : 97107 - 97117
  • [3] Asynchronous Methods for Deep Reinforcement Learning
    Mnih, Volodymyr
    Badia, Adria Puigdomenech
    Mirza, Mehdi
    Graves, Alex
    Harley, Tim
    Lillicrap, Timothy P.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [4] Generalized Representation Learning Methods for Deep Reinforcement Learning
    Zhu, Hanhua
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 5216 - 5217
  • [5] DEEP REINFORCEMENT LEARNING IN LINEAR DISCRETE ACTION SPACES
    van Heeswijk, Wouter
    La Poutre, Han
    [J]. 2020 WINTER SIMULATION CONFERENCE (WSC), 2020, : 1063 - 1074
  • [6] Decomposition methods with deep corrections for reinforcement learning
    Maxime Bouton
    Kyle D. Julian
    Alireza Nakhaei
    Kikuo Fujimura
    Mykel J. Kochenderfer
    [J]. Autonomous Agents and Multi-Agent Systems, 2019, 33 : 330 - 352
  • [7] Decomposition methods with deep corrections for reinforcement learning
    Bouton, Maxime
    Julian, Kyle D.
    Nakhaei, Alireza
    Fujimura, Kikuo
    Kochenderfer, Mykel J.
    [J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2019, 33 (03) : 330 - 352
  • [8] Benchmarking Deep and Non-deep Reinforcement Learning Algorithms for Discrete Environments
    Duarte, Fernando F.
    Lau, Nuno
    Pereira, Artur
    Reis, Luis P.
    [J]. FOURTH IBERIAN ROBOTICS CONFERENCE: ADVANCES IN ROBOTICS, ROBOT 2019, VOL 2, 2020, 1093 : 263 - 275
  • [9] Survey of Deep Reinforcement Learning Methods with Evolutionary Algorithms
    Lü S.
    Gong X.-Y.
    Zhang Z.-H.
    Han S.
    Zhang J.-W.
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (07): : 1478 - 1499
  • [10] Comparison of multiple reinforcement learning and deep reinforcement learning methods for the task aimed at achieving the goal
    Parak R.
    Matousek R.
    [J]. Mendel, 2021, 27 (01) : 1 - 8