Adaptive reinforcement learning-based control using proximal policy optimization and slime mould algorithm with experimental tower crane system validation

被引:16
|
作者
Zamfirache, Iuliu Alexandru [1 ]
Precup, Radu-Emil [1 ,2 ]
Petriu, Emil M. [3 ]
机构
[1] Politehnica University of Timisoara, Department of Automation and Applied Informatics, Bd. V. Parvan 2, Timisoara,300223, Romania
[2] Romanian Academy – Timisoara Branch, Center for Fundamental and Advanced Technical Research, Bd. Mihai Viteazu 24, Timisoara,300223, Romania
[3] University of Ottawa, School of Electrical Engineering and Computer Science, 800 King Edward, Ottawa,ON,K1N 6N5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Adaptive control systems - Iterative methods - Learning algorithms - Learning systems - Molds - Navigation - Optimal systems - Optimization;
D O I
10.1016/j.asoc.2024.111687
中图分类号
学科分类号
摘要
This paper presents a novel optimal reference tracking control approach resulted from the combination of a popular policy gradient Reinforcement Learning (RL) algorithm, namely Proximal Policy Optimization (PPO), and a metaheuristic Slime Mould Algorithm (SMA). One of the most important parameters in the PPO-based RL process is the learning rate, which has a big impact on how the parameters of the actor neural network (NN) are iteratively updated. In every episode of the RL process, the weights and the biases of the actor NN are multiplied with the learning rate, determining how much the learning agent will step into a certain direction computed based on previous experiences. The classical PPO algorithm usually relies on fixed values for the learning rates which rarely change, or not at all, during the learning process. However, its main drawback is that the learning agent cannot take advantage of positive momentum in the learning process by accelerating towards good learning experiences or slow down and quickly change the direction in the case of consecutive negative learning experiences. The main objective of the combination proposed in this paper is to create an adaptive SMA-based PPO approach applied to control systems, which instead of using fixed learning rate values, it uses the SMA to compute optimal values of the learning rates in each time step of the learning process based on the progress of the learning agent. This paper investigates if the adaptive SMA-based PPO control approach can be considered as an alternative to the classical PPO version, which employs fixed values of the learning rate. A comparison is carried out using control system performance indices gathered while performing an optimal reference tracking control task on tower crane system laboratory equipment. © 2024 The Authors
引用
收藏
相关论文
共 50 条
  • [11] Reinforcement Learning-Based Adaptive Control of a Piezo-Driven Nanopositioning System
    Chen, Liheng
    Xu, Qingsong
    IEEE OPEN JOURNAL OF THE INDUSTRIAL ELECTRONICS SOCIETY, 2024, 5 : 28 - 40
  • [12] On Explainability of Reinforcement Learning-Based Machine Learning Agents Trained with Proximal Policy Optimization That Utilizes Visual Sensor Data
    Hachaj, Tomasz
    Piekarczyk, Marcin
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [13] PPO-ABR: Proximal Policy Optimization based Deep Reinforcement Learning for Adaptive BitRate streaming
    Naresh, Mandan
    Saxena, Paresh
    Gupta, Manik
    2023 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING, IWCMC, 2023, : 199 - 204
  • [14] Optimizing parameters in swarm intelligence using reinforcement learning: An application of Proximal Policy Optimization to the iSOMA algorithm
    Klein, Lukas
    Zelinka, Ivan
    Seidl, David
    SWARM AND EVOLUTIONARY COMPUTATION, 2024, 85
  • [15] Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm
    Jayant, Ashish Kumar
    Bhatnagar, Shalabh
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [16] Reinforcement Learning-Based 3-D Sliding Mode Interception Guidance via Proximal Policy Optimization
    Guo J.
    Li M.
    Guo Z.
    She Z.
    IEEE Journal on Miniaturization for Air and Space Systems, 2023, 4 (04): : 423 - 430
  • [17] An Adaptive Online Parameter Control Algorithm for Particle Swarm Optimization Based on Reinforcement Learning
    Liu, Yaxian
    Lu, Hui
    Cheng, Shi
    Shi, Yuhui
    2019 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2019, : 815 - 822
  • [18] An Off-Policy Reinforcement Learning-Based Adaptive Optimization Method for Dynamic Resource Allocation Problem
    He, Baiyang
    Meng, Ying
    Tang, Lixin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 36 (02) : 1 - 15
  • [19] Path Following Control for Unmanned Surface Vehicles: A Reinforcement Learning-Based Method With Experimental Validation
    Wang, Yuanda
    Cao, Jingyu
    Sun, Jia
    Zou, Xuesong
    Sun, Changyin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 12 (18237-18250) : 1 - 14
  • [20] Design parameter modelling of solar power tower system using adaptive neuro-fuzzy inference system optimized with a combination of genetic algorithm and teaching learning-based optimization algorithm
    Khosravi, A.
    Malekan, M.
    Pabon, J. J. G.
    Zhao, X.
    Assad, M. E. H.
    JOURNAL OF CLEANER PRODUCTION, 2020, 244