Fitted Q-Iteration via Max-Plus-Linear Approximation

被引:0
|
作者
Liu, Yichen [1 ]
Kolarijani, Mohamad Amin Sharifi [1 ]
机构
[1] Delft Univ Technol, Delft Ctr Syst & Control, NL-2628 CD Delft, Netherlands
来源
关键词
Approximation algorithms; Convergence; Vectors; Standards; Optimal control; Complexity theory; Algebra; Real-time systems; Neural networks; Medical services; Reinforcement learning; stochastic optimal control; computational methods;
D O I
10.1109/LCSYS.2024.3520060
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this letter, we consider the application of max-plus-linear approximators for Q-function in offline reinforcement learning of discounted Markov decision processes. In particular, we incorporate these approximators to propose novel fitted Q-iteration (FQI) algorithms with provable convergence. Exploiting the compatibility of the Bellman operator with max-plus operations, we show that the max-plus-linear regression within each iteration of the proposed FQI algorithm reduces to simple max-plus matrix-vector multiplications. We also consider the variational implementation of the proposed algorithm which leads to a per-iteration complexity that is independent of the number of samples.
引用
收藏
页码:3201 / 3206
页数:6
相关论文
共 50 条
  • [41] An improved predictive control model for stochastic max-plus-linear systems
    Qu, Jingguo
    Zhang, Zilong
    Zhang, Huiqi
    CHAOS SOLITONS & FRACTALS, 2019, 128 : 210 - 218
  • [42] Linear Fitted-Q Iteration with Multiple Reward Functions
    Lizotte, Daniel J.
    Bowling, Michael
    Murphy, Susan A.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2012, 13 : 3253 - 3295
  • [43] Linear fitted-q iteration with multiple reward functions
    Lizotte, Daniel J.
    Bowling, Michael
    Murphy, Susan A.
    Journal of Machine Learning Research, 2012, 13 : 3253 - 3295
  • [44] Model predictive control for perturbed max-plus-linear systems: a stochastic approach
    Van den Boom, TJJ
    De Schutter, B
    INTERNATIONAL JOURNAL OF CONTROL, 2004, 77 (03) : 302 - 309
  • [45] Modeling and control of switching max-plus-linear systems with random and deterministic switching
    Ton J. J. van den Boom
    Bart De Schutter
    Discrete Event Dynamic Systems, 2012, 22 : 293 - 332
  • [46] Complexity reduction in MPC for stochastic max-plus-linear systems by variability expansion
    van den Boom, TJJ
    De Schutter, B
    Heidergott, B
    PROCEEDINGS OF THE 41ST IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-4, 2002, : 3567 - 3572
  • [47] Feedback properties of model predictive control Tor max-plus-linear systems
    Masuda, Shiro
    Goto, Hiroyuki
    2007 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, SENSING, AND CONTROL, VOLS 1 AND 2, 2007, : 553 - +
  • [48] MPC for max-plus-linear systems: Closed-loop behavior and tuning
    van den Boom, T
    De Schutter, B
    PROCEEDINGS OF THE 2001 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2001, : 325 - 330
  • [49] Worst-case optimal control of uncertain max-plus-linear systems
    Necoara, Ion
    Kerrigan, Eric C.
    De Schutter, Bart
    van den Boom, Ton J. J.
    PROCEEDINGS OF THE 45TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2006, : 6055 - +
  • [50] Modeling and control of switching max-plus-linear systems with random and deterministic switching
    van den Boom, Ton J. J.
    De Schutter, Bart
    DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2012, 22 (03): : 293 - 332