Continuous Action-Space Reinforcement Learning Methods Applied to the Minimum-Time Swing-up of the Acrobot

被引：7

作者：

Nichols, Barry D. ^{[1
]}

机构：

[1] Middlesex Univ, Sch Sci & Technol, London N17 8HR, England

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS | 2015年

关键词：

Reinforcement Learning; Continuous Action-Space; Computational Intelligence; Artificial Neural Networks; Intelligent Control; SIMPLEX-METHOD;

D O I：

10.1109/SMC.2015.364

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Here I apply three reinforcement learning methods to the full, continuous action, swing-up acrobot control benchmark problem. These include two approaches from the literature: CACLA and NM-SARSA and a novel approach which I refer to as NelderMead-SARSA. NelderMead-SARSA, like NM-SARSA, directly optimises the state-action value function for action selection, in order to allow continuous action reinforcement learning without a separate policy function. However, as it uses a derivative-free method it does not require the first or second partial derivatives of the value function. All three methods achieved good results in terms of swing-up times, comparable to previous approaches from the literature. Particularly NelderMead-SARSA, which performed the swing-up in a shorter time than many approaches from the literature.

引用

页码：2084 / 2089

页数：6

共 10 条

[1] Combining Deep Reinforcement Learning And Local Control For The Acrobot Swing-up And Balance Task
Gillen, Sean
Molnar, Marco
Byl, Katie
[J]. 2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 4129 - 4134
[2] Minimum-time swing-up of a rotary inverted pendulum by iterative impulsive control
Wang, ZM
Chen, YQ
Fang, N
[J]. PROCEEDINGS OF THE 2004 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2004, : 1335 - 1340
[3] A Comparison of Action Selection Methods for Implicit Policy Method Reinforcement Learning in Continuous Action-Space
Nichols, Barry D.
[J]. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3785 - 3792
[4] Minimum Time Swing-Up Controller applied to a Rotary Inverted Pendulum
Alarcon, Claudio
Munoz, Carlos
[J]. 2017 CHILEAN CONFERENCE ON ELECTRICAL, ELECTRONICS ENGINEERING, INFORMATION AND COMMUNICATION TECHNOLOGIES (CHILECON), 2017,
[5] Personalized vital signs control based on continuous action-space reinforcement learning with supervised experience
Sun, Chenxi
Hong, Shenda
Song, Moxian
Shang, Junyuan
Li, Hongyan
[J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 69
[6] A Parametric Study of a Deep Reinforcement Learning Control System Applied to the Swing-Up Problem of the Cart-Pole
Manrique Escobar, Camilo Andres
Pappalardo, Carmine Maria
Guida, Domenico
[J]. APPLIED SCIENCES-BASEL, 2020, 10 (24): : 1 - 19
[7] Swarm Reinforcement Learning Methods for Problems with Continuous State-Action Space
Iima, Hitoshi
Kuroe, Yasuaki
Emoto, Kazuo
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 2173 - 2180
[8] Policy iterations for reinforcement learning problems in continuous time and space - Fundamental theory and methods
Lee, Jaeyoung
Sutton, Richard S.
[J]. AUTOMATICA, 2021, 126
[9] OPTIMAL-CONTROL OF CONTINUOUS FERMENTATION PROCESSES - DECOMPOSITION METHODS FOR START-UP, STEADY-STATE AND MINIMUM-TIME PROBLEMS
TSONEVA, RG
PATARINSKA, TD
[J]. BIOPROCESS ENGINEERING, 1995, 13 (04): : 189 - 196
[10] Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail
Vasilaki, Eleni
Fremaux, Nicolas
Urbanczik, Robert
Senn, Walter
Gerstner, Wulfram
[J]. PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (12)

← 1 →