Continuous Action-Space Reinforcement Learning Methods Applied to the Minimum-Time Swing-up of the Acrobot

被引:7
|
作者
Nichols, Barry D. [1 ]
机构
[1] Middlesex Univ, Sch Sci & Technol, London N17 8HR, England
关键词
Reinforcement Learning; Continuous Action-Space; Computational Intelligence; Artificial Neural Networks; Intelligent Control; SIMPLEX-METHOD;
D O I
10.1109/SMC.2015.364
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Here I apply three reinforcement learning methods to the full, continuous action, swing-up acrobot control benchmark problem. These include two approaches from the literature: CACLA and NM-SARSA and a novel approach which I refer to as NelderMead-SARSA. NelderMead-SARSA, like NM-SARSA, directly optimises the state-action value function for action selection, in order to allow continuous action reinforcement learning without a separate policy function. However, as it uses a derivative-free method it does not require the first or second partial derivatives of the value function. All three methods achieved good results in terms of swing-up times, comparable to previous approaches from the literature. Particularly NelderMead-SARSA, which performed the swing-up in a shorter time than many approaches from the literature.
引用
收藏
页码:2084 / 2089
页数:6
相关论文
共 10 条
  • [1] Combining Deep Reinforcement Learning And Local Control For The Acrobot Swing-up And Balance Task
    Gillen, Sean
    Molnar, Marco
    Byl, Katie
    [J]. 2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 4129 - 4134
  • [2] Minimum-time swing-up of a rotary inverted pendulum by iterative impulsive control
    Wang, ZM
    Chen, YQ
    Fang, N
    [J]. PROCEEDINGS OF THE 2004 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2004, : 1335 - 1340
  • [3] A Comparison of Action Selection Methods for Implicit Policy Method Reinforcement Learning in Continuous Action-Space
    Nichols, Barry D.
    [J]. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3785 - 3792
  • [4] Minimum Time Swing-Up Controller applied to a Rotary Inverted Pendulum
    Alarcon, Claudio
    Munoz, Carlos
    [J]. 2017 CHILEAN CONFERENCE ON ELECTRICAL, ELECTRONICS ENGINEERING, INFORMATION AND COMMUNICATION TECHNOLOGIES (CHILECON), 2017,
  • [5] Personalized vital signs control based on continuous action-space reinforcement learning with supervised experience
    Sun, Chenxi
    Hong, Shenda
    Song, Moxian
    Shang, Junyuan
    Li, Hongyan
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 69
  • [6] A Parametric Study of a Deep Reinforcement Learning Control System Applied to the Swing-Up Problem of the Cart-Pole
    Manrique Escobar, Camilo Andres
    Pappalardo, Carmine Maria
    Guida, Domenico
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (24): : 1 - 19
  • [7] Swarm Reinforcement Learning Methods for Problems with Continuous State-Action Space
    Iima, Hitoshi
    Kuroe, Yasuaki
    Emoto, Kazuo
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 2173 - 2180
  • [8] Policy iterations for reinforcement learning problems in continuous time and space - Fundamental theory and methods
    Lee, Jaeyoung
    Sutton, Richard S.
    [J]. AUTOMATICA, 2021, 126
  • [9] OPTIMAL-CONTROL OF CONTINUOUS FERMENTATION PROCESSES - DECOMPOSITION METHODS FOR START-UP, STEADY-STATE AND MINIMUM-TIME PROBLEMS
    TSONEVA, RG
    PATARINSKA, TD
    [J]. BIOPROCESS ENGINEERING, 1995, 13 (04): : 189 - 196
  • [10] Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail
    Vasilaki, Eleni
    Fremaux, Nicolas
    Urbanczik, Robert
    Senn, Walter
    Gerstner, Wulfram
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (12)