Reinforcement Learning based on MPC/MHE for Unmodeled and Partially Observable Dynamics

被引:11
|
作者
Esfahani, Hossein Nejatbakhsh [1 ]
Kordabad, Arash Bahari [1 ]
Gros, Sebastien [1 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept Engn Cybernet, Trondheim, Norway
关键词
D O I
10.23919/ACC50511.2021.9483399
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes an observer-based framework for solving Partially Observable Markov Decision Processes (POMDPs) when an accurate model is not available. We first propose to use a Moving Horizon Estimation-Model Predictive Control (MHE-MPC) scheme in order to provide a policy for the POMDP problem, where the full state of the real process is not measured and necessarily known. We propose to parameterize both MPC and MHE formulations, where certain adjustable parameters are regarded for tuning the policy. In this paper, for the sake of tackling the unmodeled and partially observable dynamics, we leverage the Reinforcement Learning (RL) to tune the parameters of MPC and MHE schemes jointly, with the closed-loop performance of the policy as a goal rather than model fitting or the MHE performance. Illustrations show that the proposed approach can effectively increase the performance of close-loop control of systems formulated as POMDPs.
引用
收藏
页码:2121 / 2126
页数:6
相关论文
共 50 条
  • [1] Market-based reinforcement learning in partially observable worlds
    Kwee, I
    Hutter, M
    Schmidhuber, J
    [J]. ARTIFICIAL NEURAL NETWORKS-ICANN 2001, PROCEEDINGS, 2001, 2130 : 865 - 873
  • [2] Learning Reward Machines for Partially Observable Reinforcement Learning
    Icarte, Rodrigo Toro
    Waldie, Ethan
    Klassen, Toryn Q.
    Valenzano, Richard
    Castro, Margarita P.
    McIlraith, Sheila A.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Inverse reinforcement learning in partially observable environments
    Choi, Jaedeug
    Kim, Kee-Eung
    [J]. Journal of Machine Learning Research, 2011, 12 : 691 - 730
  • [4] Inverse Reinforcement Learning in Partially Observable Environments
    Choi, Jaedeug
    Kim, Kee-Eung
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 691 - 730
  • [5] Inverse Reinforcement Learning in Partially Observable Environments
    Choi, Jaedeug
    Kim, Kee-Eung
    [J]. 21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1028 - 1033
  • [6] Partially Observable Reinforcement Learning for Dialog-based Interactive Recommendation
    Wu, Yaxiong
    Macdonald, Craig
    Ounis, Iadh
    [J]. 15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, : 241 - 251
  • [7] Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning
    Park, Giseung
    Choi, Sungho
    Sung, Youngchul
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7941 - 7948
  • [8] Learning reward machines: A study in partially observable reinforcement learning 
    Icarte, Rodrigo Toro
    Klassen, Toryn Q.
    Valenzano, Richard
    Castro, Margarita P.
    Waldie, Ethan
    Mcilraith, Sheila A.
    [J]. ARTIFICIAL INTELLIGENCE, 2023, 323
  • [9] Partially Observable Reinforcement Learning for Sustainable Active Surveillance
    Chen, Hechang
    Yang, Bo
    Liu, Jiming
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2018, PT II, 2018, 11062 : 425 - 437
  • [10] Regret Minimization for Partially Observable Deep Reinforcement Learning
    Jin, Peter
    Keutzer, Kurt
    Levine, Sergey
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80