Framework for solving time-delayed Markov Decision Processes

被引：1

作者：

Sawaya, Yorgo ^{[1
,2
]}

Issa, George ^{[1
,2
]}

Marzen, Sarah E. ^{[1
,2
]}

机构：

[1] Pitzer Scripps Coll, W M Keck Sci Dept, Claremont, CA 91711 USA

[2] Claremont McKenna Coll, Claremont, CA 91711 USA

来源：

PHYSICAL REVIEW RESEARCH | 2023年 / 5卷 / 03期

关键词：

Bellman equations - Engineer systems - Engineered systems - Markov Decision Processes - Process framework - Reinforcement learning algorithms - Reinforcement learnings - Theoretical framework - Time delayed - Time-delays;

D O I：

10.1103/PhysRevResearch.5.033034

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

Reinforcement learning has revolutionized our understanding of evolved systems and our ability to engineer systems based on a theoretical framework for understanding how to maximize expected reward. However, time delays between the observation and action are estimated to be roughly & SIM;150 ms for humans, and this should affect reinforcement learning algorithms. We reformulate the Markov Decision Process framework to include time delays in action, first deriving a new Bellman equation in a way that unifies previous attempts and then implementing the corresponding SARSA-like algorithm. The main ramification-potentially useful for both evolved and engineered systems-is that, when the size of the state space is lower than that of the action space, the modified reinforcement learning algorithms will prefer to operate on sequences of states rather than just the present state with the length of the sequence equal to 1 plus the time delay.

引用

页数：8

共 50 条

[1] Delayed Nondeterminism in Continuous-Time Markov Decision Processes
Neuhaeusser, Martin R.
Stoelinga, Marielle
Katoen, Joost-Pieter
[J]. FOUNDATIONS OF SOFTWARE SCIENCE AND COMPUTATIONAL STRUCTURES, PROCEEDINGS, 2009, 5504 : 364 - +
[2] Solving optimal control problems of the time-delayed systems by a neural network framework
Nazemi, Alireza
Fayyazi, Ensieh
Mortezaee, Marzieh
[J]. CONNECTION SCIENCE, 2019, 31 (04) : 342 - 372
[3] Solving concurrent Markov decision processes
Weld, M
Weld, DS
[J]. PROCEEDING OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE SIXTEENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2004, : 716 - 722
[4] Solving hybrid Markov decision processes
Reyes, Alberto
Sucar, L. Enrique
Morales, Eduardo F.
Ibarguengoytia, Pablo H.
[J]. MICAI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4293 : 227 - +
[5] A framework for the control of time-delayed telerobotic systems
Tarn, TJ
Brady, K
[J]. ROBOT CONTROL 1997, VOLS 1 AND 2, 1998, : 599 - 604
[6] Efficient Model Solving for Markov Decision Processes
Sapio, Adrian
Bhattacharyya, Shuvra S.
Wolf, Marilyn
[J]. 2020 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2020, : 607 - 611
[7] Mathematical analysis of the Wiener processes with time-delayed feedback
Kobayashi, Miki U.
Takehara, Kohta
Ando, Hiroyasu
Yamada, Michio
[J]. AIP ADVANCES, 2024, 14 (09)
[8] Modeling and estimation of dynamics of time-delayed plants/processes
Ghorai, Prasenjit
Majhi, Somanath
Eskandarian, Azim
Pandey, Saurabh
Kasi, Venkata Ramana
[J]. INTERNATIONAL JOURNAL OF DYNAMICS AND CONTROL, 2023, 11 (01) : 183 - 193
[9] H∞ control of time-delayed LPV repetitive processes
Qi, Ji
Li, Yan-Hui
[J]. Kongzhi yu Juece/Control and Decision, 2012, 27 (01): : 93 - 98
[10] Modeling and estimation of dynamics of time-delayed plants/processes
Prasenjit Ghorai
Somanath Majhi
Azim Eskandarian
Saurabh Pandey
Venkata Ramana Kasi
[J]. International Journal of Dynamics and Control, 2023, 11 : 183 - 193

← 1 2 3 4 5 →