Reinforcement Learning With Model-Based Assistance for Shape Control in Sendzimir Rolling Mills

被引:3
|
作者
Park, Jonghyuk [1 ]
Kim, Beomsu [1 ]
Han, Soohee [1 ]
机构
[1] Pohang Univ Sci & Technol, Dept Convergence IT Engn, Pohang 37673, South Korea
基金
新加坡国家研究基金会;
关键词
Actor-critic policy gradient; cold rolling mill; partially observable Markov decision process (MDP); reinforcement learning; Sendzimir rolling mill (ZRM); CONTROL-SYSTEMS; IMPROVEMENT;
D O I
10.1109/TCST.2022.3227502
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As one of the most popular tandem cold rolling mills, the Sendzimir rolling mill (ZRM) aims to obtain a flat steel strip shape by properly allocating the rolling pressure. To improve the performance of the ZRM, it is meaningful to adopt recently emerging deep reinforcement learning (DRL) that is powerful for difficult-to-solve and challenging problems. However, the direct application of DRL techniques may be impractical because of a serious singularity, partial observability, and even safety issues inherent in mill systems. In this brief, we propose an effective hybridization approach that integrates a model-based assistant into model-free DRL to resolve such practical issues. For the model-based assistant, a model-based optimization problem is first constructed and solved for the static part of the mill model. Then, the obtained static model-based coarse assistant, or controller, is improved by the proposed reinforcement learning, considering the remaining dynamic part of the mill model. The serious singularity can be resolved using the model-based approach, and the issue of partial observability is addressed by the long short-term memory (LSTM) state estimator in the proposed method. In simulation results, the proposed method successfully learns a highly performing policy for the ZRM, achieving a higher reward than pure model-free DRL. It is also observed that the proposed method can safely improve the shape controller of the mill system. The demonstration results strongly confirm the high applicability of DRL to other cold multiroll mills, such as four-high, six-high, and cluster mills.
引用
收藏
页码:1867 / 1874
页数:8
相关论文
共 50 条
  • [21] Advances in model-based reinforcement learning for Adaptive Optics control
    Nousiainen, Jalo
    Engler, Byron
    Kasper, Markus
    Helin, Tapio
    Heritier, Cedric T.
    Rajani, Chang
    ADAPTIVE OPTICS SYSTEMS VIII, 2022, 12185
  • [22] Adaptive optics control using model-based reinforcement learning
    Nousiainen, Jalo
    Rajani, Chang
    Kasper, Markus
    Helin, Tapio
    OPTICS EXPRESS, 2021, 29 (10) : 15327 - 15344
  • [23] Learning to Shape by Grinding: Cutting-Surface-Aware Model-Based Reinforcement Learning
    Hachimine, Takumi
    Morimoto, Jun
    Matsubara, Takamitsu
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6235 - 6242
  • [24] SHAPE CONTROL IN SENDZIMIR MILLS USING BOTH CROWN AND INTERMEDIATE ROLL ACTUATORS
    RINGWOOD, JV
    GRIMBLE, MJ
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1990, 35 (04) : 453 - 459
  • [25] Robust shape control in a Sendzimir cold-rolling steel mill
    Bates, DG
    Ringwood, JV
    Holohan, AM
    CONTROL ENGINEERING PRACTICE, 1997, 5 (12) : 1647 - 1652
  • [26] Model-based Reinforcement Learning: A Survey
    Moerland, Thomas M.
    Broekens, Joost
    Plaat, Aske
    Jonker, Catholijn M.
    FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2023, 16 (01): : 1 - 118
  • [27] A survey on model-based reinforcement learning
    Fan-Ming LUO
    Tian XU
    Hang LAI
    Xiong-Hui CHEN
    Weinan ZHANG
    Yang YU
    Science China(Information Sciences), 2024, 67 (02) : 59 - 84
  • [28] Nonparametric model-based reinforcement learning
    Atkeson, CG
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 1008 - 1014
  • [29] The ubiquity of model-based reinforcement learning
    Doll, Bradley B.
    Simon, Dylan A.
    Daw, Nathaniel D.
    CURRENT OPINION IN NEUROBIOLOGY, 2012, 22 (06) : 1075 - 1081
  • [30] Multiple model-based reinforcement learning
    Doya, K
    Samejima, K
    Katagiri, K
    Kawato, M
    NEURAL COMPUTATION, 2002, 14 (06) : 1347 - 1369