Bridging the gap between QP-based and MPC-based Reinforcement Learning

被引：1

作者：

Sawant, Shambhuraj ^{[1
]}

Gros, Sebastien ^{[1
]}

机构：

[1] Norwegian Univ Sci & Technol NTNU, Dept Engn Cybernet, Trondheim, Norway

来源：

IFAC PAPERSONLINE | 2022年 / 55卷 / 15期

关键词：

Quadratic Programming; Reinforcement Learning; Model Predictive Control;

D O I：

10.1016/j.ifacol.2022.07.600

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning methods typically use Deep Neural Networks to approximate the value functions and policies underlying a Markov Decision Process. Unfortunately, DNN-based RL suffers from a lack of explainability of the resulting policy. In this paper, we instead approximate the policy and value functions using an optimization problem, taking the form of Quadratic Programs (QPs). We propose simple tools to promote structures in the QP, pushing it to resemble a linear MPC scheme. A generic unstructured QP offers high flexibility for learning, while a QP having the structure of an MPC scheme promotes the explainability of the resulting policy, additionally provides ways for its analysis. The tools we propose allow for continuously adjusting the trade-off between the former and the latter during learning. We illustrate the workings of our proposed method with the resulting structure using a point-mass task. Copyright (c) 2022 The Authors. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)

引用

页码：7 / 12

页数：6

共 50 条

[1] Bridging the Gap Between Value and Policy Based Reinforcement Learning
Nachum, Ofir
Norouzi, Mohammad
Xu, Kelvin
Schuurmans, Dale
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[2] MPC-based Reinforcement Learning for Economic Problems with Application to Battery Storage
Kordabad, Arash Bahari
Cai, Wenqi
Gros, Sebastien
[J]. 2021 EUROPEAN CONTROL CONFERENCE (ECC), 2021, : 2573 - 2578
[3] MPC-based Reinforcement Learning for a Simplified Freight Mission of Autonomous Surface Vehicles
Cai, Wenqi
Kordabad, Arash B.
Esfahani, Hossein N.
Lekkas, Anastasios M.
Gros, Sebastien
[J]. 2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 2990 - 2995
[4] Multi-agent Battery Storage Management using MPC-based Reinforcement Learning
Kordabad, Arash Bahari
Cai, Wenqi
Gros, Sebastien
[J]. 5TH IEEE CONFERENCE ON CONTROL TECHNOLOGY AND APPLICATIONS (IEEE CCTA 2021), 2021, : 57 - 62
[5] Optimal Management of the Peak Power Penalty for Smart Grids Using MPC-based Reinforcement Learning
Cai, Wenqi
Esfahani, Hossein N.
Kordabad, Arash B.
Gros, Sebastien
[J]. 2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 6365 - 6370
[6] Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies
Gros, Sebastien
Zanon, Mario
[J]. 2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 2543 - 2548
[7] Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning
Piot, Bilal
Geist, Matthieu
Pietquin, Olivier
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (08) : 1814 - 1826
[8] An inexact QP-based method for nonlinear complementarity problems
Kanzow, C
[J]. NUMERISCHE MATHEMATIK, 1998, 80 (04) : 557 - 577
[9] An Approach to QP-based Thrust Allocation considering Inflow
Koschorrek, Philipp
Kosch, Martin
[J]. IFAC PAPERSONLINE, 2021, 54 (16): : 126 - 131
[10] Neural Flocking: MPC-based Supervised Learning of Flocking Controllers
Mehmood, Usama
Roy, Shouvik
Grosu, Radu
Smolka, Scott A.
Stoller, Scott D.
Tiwari, Ashish
[J]. FOUNDATIONS OF SOFTWARE SCIENCE AND COMPUTATION STRUCTURES, FOSSACS 2020, 2020, 12077 : 1 - 16

← 1 2 3 4 5 →