A novel sim2real reinforcement learning algorithm for process control

被引：1

作者：

Liang, Huiping ^{[1
,2
]}

Xie, Junyao ^{[2
]}

Huang, Biao ^{[2
]}

Li, Yonggang ^{[1
,3
]}

Sun, Bei ^{[1
,3
]}

Yang, Chunhua ^{[1
]}

机构：

[1] Cent South Univ, Sch Automat, Changsha 410083, Peoples R China

[2] Univ Alberta, Dept Chem & Mat Engn, Edmonton, AB T6G 2V4, Canada

[3] Peng Cheng Lab, Shenzhen 518000, Peoples R China

来源：

RELIABILITY ENGINEERING & SYSTEM SAFETY | 2025年 / 254卷

基金：

中国国家自然科学基金;

关键词：

Reinforcement learning; Process control; Model-plant mismatch; Fix-horizon return; Industrial roasting process;

D O I：

10.1016/j.ress.2024.110639

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method.

引用

页数：12

共 50 条

[1] Sim2Real Transfer for Reinforcement Learning without Dynamics Randomization
Kaspar, Manuel
Osorio, Juan D. Munoz
Bock, Juergen
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 4383 - 4388
[2] Sim2Real Transfer of Reinforcement Learning for Concentric Tube Robots
Iyengar, Keshav
Sadati, S. M. Hadi
Bergeles, Christos
Spurgeon, Sarah
Stoyanov, Danail
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6147 - 6154
[3] Surrogate empowered Sim2Real transfer of deep reinforcement learning for ORC superheat control
Lin, Runze
Luo, Yangyang
Wu, Xialai
Chen, Junghui
Huang, Biao
Su, Hongye
Xie, Lei
APPLIED ENERGY, 2024, 356
[4] Adaptability Preserving Domain Decomposition for Stabilizing Sim2Real Reinforcement Learning
Gao, Haichuan
Yang, Zhile
Su, Xin
Tan, Tian
Chen, Feng
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 4403 - 4410
[5] DeepRacer: Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement Learning
Balaji, Bharathan
Mallya, Sunil
Genc, Sahika
Gupta, Saurabh
Dirac, Leo
Khare, Vineet
Roy, Gourav
Sun, Tao
Tao, Yunzhe
Townsend, Brian
Calleja, Eddie
Muralidhara, Sunil
Karuppasamy, Dhanasekar
2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 2746 - 2754
[6] Transition Control of a Double-Inverted Pendulum System Using Sim2Real Reinforcement Learning
Lee, Taegun
Ju, Doyoon
Lee, Young Sam
MACHINES, 2025, 13 (03)
[7] Real2Sim or Sim2Real: Robotics Visual Insertion Using Deep Reinforcement Learning and Real2Sim Policy Adaptation
Chen, Yiwen
Li, Xue
Guo, Sheng
Ng, Xian Yao
Ang, Marcelo H.
INTELLIGENT AUTONOMOUS SYSTEMS 17, IAS-17, 2023, 577 : 617 - 629
[8] Sim2Real Manipulation on Unknown Objects with Tactile-based Reinforcement Learning
Su, Entong
Jia, Chengzhe
Qin, Yuzhe
Zhou, Wenxuan
Macaluso, Annabella
Huang, Binghao
Wang, Xiaolong
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 9234 - 9241
[9] The Role of Time Delay in Sim2real Transfer of Reinforcement Learning for Unmanned Aerial Vehicles
ElOcla, Norhan Mohsen
Chehadeh, Mohamad
Boiko, Igor
Swei, Sean
Zweiri, Yahya
2023 21ST INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS, ICAR, 2023, : 514 - 519
[10] A Dual Decision-Making Continuous Reinforcement Learning Method Based on Sim2Real
Xiao, Wenwen
Wang, Xinzhi
Luo, Xiangfeng
Xie, Shaorong
INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2024, 34 (03) : 467 - 488

← 1 2 3 4 5 →