Bridging the gap between QP-based and MPC-based Reinforcement Learning

被引:1
|
作者
Sawant, Shambhuraj [1 ]
Gros, Sebastien [1 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept Engn Cybernet, Trondheim, Norway
来源
IFAC PAPERSONLINE | 2022年 / 55卷 / 15期
关键词
Quadratic Programming; Reinforcement Learning; Model Predictive Control;
D O I
10.1016/j.ifacol.2022.07.600
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning methods typically use Deep Neural Networks to approximate the value functions and policies underlying a Markov Decision Process. Unfortunately, DNN-based RL suffers from a lack of explainability of the resulting policy. In this paper, we instead approximate the policy and value functions using an optimization problem, taking the form of Quadratic Programs (QPs). We propose simple tools to promote structures in the QP, pushing it to resemble a linear MPC scheme. A generic unstructured QP offers high flexibility for learning, while a QP having the structure of an MPC scheme promotes the explainability of the resulting policy, additionally provides ways for its analysis. The tools we propose allow for continuously adjusting the trade-off between the former and the latter during learning. We illustrate the workings of our proposed method with the resulting structure using a point-mass task. Copyright (c) 2022 The Authors. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
引用
收藏
页码:7 / 12
页数:6
相关论文
共 50 条
  • [1] Bridging the Gap Between Value and Policy Based Reinforcement Learning
    Nachum, Ofir
    Norouzi, Mohammad
    Xu, Kelvin
    Schuurmans, Dale
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [2] MPC-based Reinforcement Learning for Economic Problems with Application to Battery Storage
    Kordabad, Arash Bahari
    Cai, Wenqi
    Gros, Sebastien
    [J]. 2021 EUROPEAN CONTROL CONFERENCE (ECC), 2021, : 2573 - 2578
  • [3] MPC-based Reinforcement Learning for a Simplified Freight Mission of Autonomous Surface Vehicles
    Cai, Wenqi
    Kordabad, Arash B.
    Esfahani, Hossein N.
    Lekkas, Anastasios M.
    Gros, Sebastien
    [J]. 2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 2990 - 2995
  • [4] Multi-agent Battery Storage Management using MPC-based Reinforcement Learning
    Kordabad, Arash Bahari
    Cai, Wenqi
    Gros, Sebastien
    [J]. 5TH IEEE CONFERENCE ON CONTROL TECHNOLOGY AND APPLICATIONS (IEEE CCTA 2021), 2021, : 57 - 62
  • [5] Optimal Management of the Peak Power Penalty for Smart Grids Using MPC-based Reinforcement Learning
    Cai, Wenqi
    Esfahani, Hossein N.
    Kordabad, Arash B.
    Gros, Sebastien
    [J]. 2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 6365 - 6370
  • [6] Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies
    Gros, Sebastien
    Zanon, Mario
    [J]. 2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 2543 - 2548
  • [7] Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning
    Piot, Bilal
    Geist, Matthieu
    Pietquin, Olivier
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (08) : 1814 - 1826
  • [8] An inexact QP-based method for nonlinear complementarity problems
    Kanzow, C
    [J]. NUMERISCHE MATHEMATIK, 1998, 80 (04) : 557 - 577
  • [9] An Approach to QP-based Thrust Allocation considering Inflow
    Koschorrek, Philipp
    Kosch, Martin
    [J]. IFAC PAPERSONLINE, 2021, 54 (16): : 126 - 131
  • [10] Neural Flocking: MPC-based Supervised Learning of Flocking Controllers
    Mehmood, Usama
    Roy, Shouvik
    Grosu, Radu
    Smolka, Scott A.
    Stoller, Scott D.
    Tiwari, Ashish
    [J]. FOUNDATIONS OF SOFTWARE SCIENCE AND COMPUTATION STRUCTURES, FOSSACS 2020, 2020, 12077 : 1 - 16