Reinforcement Learning based on MPC/MHE for Unmodeled and Partially Observable Dynamics

被引:11
|
作者
Esfahani, Hossein Nejatbakhsh [1 ]
Kordabad, Arash Bahari [1 ]
Gros, Sebastien [1 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept Engn Cybernet, Trondheim, Norway
关键词
D O I
10.23919/ACC50511.2021.9483399
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes an observer-based framework for solving Partially Observable Markov Decision Processes (POMDPs) when an accurate model is not available. We first propose to use a Moving Horizon Estimation-Model Predictive Control (MHE-MPC) scheme in order to provide a policy for the POMDP problem, where the full state of the real process is not measured and necessarily known. We propose to parameterize both MPC and MHE formulations, where certain adjustable parameters are regarded for tuning the policy. In this paper, for the sake of tackling the unmodeled and partially observable dynamics, we leverage the Reinforcement Learning (RL) to tune the parameters of MPC and MHE schemes jointly, with the closed-loop performance of the policy as a goal rather than model fitting or the MHE performance. Illustrations show that the proposed approach can effectively increase the performance of close-loop control of systems formulated as POMDPs.
引用
收藏
页码:2121 / 2126
页数:6
相关论文
共 50 条
  • [31] Collaborative Partially-Observable Reinforcement Learning Using Wireless Communications
    Ko, Eisaku
    Chen, Kwang-Cheng
    Lien, Shao-Yu
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
  • [32] Sample-Efficient Reinforcement Learning of Partially Observable Markov Games
    Liu, Qinghua
    Szepesvari, Csaba
    Jin, Chi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [33] Fuzzy Reinforcement Learning Control for Decentralized Partially Observable Markov Decision Processes
    Sharma, Rajneesh
    Spaan, Matthijs T. J.
    IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 1422 - 1429
  • [34] Deep Reinforcement Learning for Partially Observable Data Poisoning Attack in Crowdsensing Systems
    Li, Mohan
    Sun, Yanbin
    Lu, Hui
    Maharjan, Sabita
    Tian, Zhihong
    IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (07): : 6266 - 6278
  • [35] A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes
    Le, Tuyen P.
    Ngo Anh Vien
    Chung, Taechoong
    IEEE ACCESS, 2018, 6 : 49089 - 49102
  • [36] A Model of External Memory for Navigation in Partially Observable Visual Reinforcement Learning Tasks
    Smith, Robert J.
    Heywood, Malcolm I.
    GENETIC PROGRAMMING, EUROGP 2019, 2019, 11451 : 162 - 177
  • [37] EPPTA: Efficient partially observable reinforcement learning agent for penetration testing applications
    Li, Zegang
    Zhang, Qian
    Yang, Guangwen
    ENGINEERING REPORTS, 2023,
  • [38] A reinforcement learning scheme for a partially-observable multi-agent game
    Ishii, S
    Fujita, H
    Mitsutake, M
    Yamazaki, T
    Matsuda, J
    Matsuno, Y
    MACHINE LEARNING, 2005, 59 (1-2) : 31 - 54
  • [39] Reinforcement Learning-Based Autonomous Navigation and Obstacle Avoidance for USVs under Partially Observable Conditions
    Yan, Nan
    Huang, Subin
    Kong, Chao
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [40] Reinforcement learning for cooperative actions in a partially observable multi-agent system
    Taniguchi, Yuki
    Mori, Takeshi
    Ishii, Shin
    ARTIFICIAL NEURAL NETWORKS - ICANN 2007, PT 1, PROCEEDINGS, 2007, 4668 : 229 - +