Framework for solving time-delayed Markov Decision Processes

被引:1
|
作者
Sawaya, Yorgo [1 ,2 ]
Issa, George [1 ,2 ]
Marzen, Sarah E. [1 ,2 ]
机构
[1] Pitzer Scripps Coll, W M Keck Sci Dept, Claremont, CA 91711 USA
[2] Claremont McKenna Coll, Claremont, CA 91711 USA
来源
PHYSICAL REVIEW RESEARCH | 2023年 / 5卷 / 03期
关键词
Bellman equations - Engineer systems - Engineered systems - Markov Decision Processes - Process framework - Reinforcement learning algorithms - Reinforcement learnings - Theoretical framework - Time delayed - Time-delays;
D O I
10.1103/PhysRevResearch.5.033034
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Reinforcement learning has revolutionized our understanding of evolved systems and our ability to engineer systems based on a theoretical framework for understanding how to maximize expected reward. However, time delays between the observation and action are estimated to be roughly & SIM;150 ms for humans, and this should affect reinforcement learning algorithms. We reformulate the Markov Decision Process framework to include time delays in action, first deriving a new Bellman equation in a way that unifies previous attempts and then implementing the corresponding SARSA-like algorithm. The main ramification-potentially useful for both evolved and engineered systems-is that, when the size of the state space is lower than that of the action space, the modified reinforcement learning algorithms will prefer to operate on sequences of states rather than just the present state with the length of the sequence equal to 1 plus the time delay.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Delayed Nondeterminism in Continuous-Time Markov Decision Processes
    Neuhaeusser, Martin R.
    Stoelinga, Marielle
    Katoen, Joost-Pieter
    [J]. FOUNDATIONS OF SOFTWARE SCIENCE AND COMPUTATIONAL STRUCTURES, PROCEEDINGS, 2009, 5504 : 364 - +
  • [2] Solving optimal control problems of the time-delayed systems by a neural network framework
    Nazemi, Alireza
    Fayyazi, Ensieh
    Mortezaee, Marzieh
    [J]. CONNECTION SCIENCE, 2019, 31 (04) : 342 - 372
  • [3] Solving concurrent Markov decision processes
    Weld, M
    Weld, DS
    [J]. PROCEEDING OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE SIXTEENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2004, : 716 - 722
  • [4] Solving hybrid Markov decision processes
    Reyes, Alberto
    Sucar, L. Enrique
    Morales, Eduardo F.
    Ibarguengoytia, Pablo H.
    [J]. MICAI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4293 : 227 - +
  • [5] A framework for the control of time-delayed telerobotic systems
    Tarn, TJ
    Brady, K
    [J]. ROBOT CONTROL 1997, VOLS 1 AND 2, 1998, : 599 - 604
  • [6] Efficient Model Solving for Markov Decision Processes
    Sapio, Adrian
    Bhattacharyya, Shuvra S.
    Wolf, Marilyn
    [J]. 2020 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2020, : 607 - 611
  • [7] Mathematical analysis of the Wiener processes with time-delayed feedback
    Kobayashi, Miki U.
    Takehara, Kohta
    Ando, Hiroyasu
    Yamada, Michio
    [J]. AIP ADVANCES, 2024, 14 (09)
  • [8] Modeling and estimation of dynamics of time-delayed plants/processes
    Ghorai, Prasenjit
    Majhi, Somanath
    Eskandarian, Azim
    Pandey, Saurabh
    Kasi, Venkata Ramana
    [J]. INTERNATIONAL JOURNAL OF DYNAMICS AND CONTROL, 2023, 11 (01) : 183 - 193
  • [9] H∞ control of time-delayed LPV repetitive processes
    Qi, Ji
    Li, Yan-Hui
    [J]. Kongzhi yu Juece/Control and Decision, 2012, 27 (01): : 93 - 98
  • [10] Modeling and estimation of dynamics of time-delayed plants/processes
    Prasenjit Ghorai
    Somanath Majhi
    Azim Eskandarian
    Saurabh Pandey
    Venkata Ramana Kasi
    [J]. International Journal of Dynamics and Control, 2023, 11 : 183 - 193