Controller Synthesis for Reward Collecting Markov Processes in Continuous Space

被引：3

作者：

Soudjani, Sadegh Esmaeil Zadeh ^{[1
]}

Majumdar, Rupak ^{[1
]}

机构：

[1] Max Planck Inst Software Syst, Kaiserslautern, Germany

来源：

PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON HYBRID SYSTEMS: COMPUTATION AND CONTROL (PART OF CPS WEEK) (HSCC' 17) | 2017年

关键词：

Reward collectingMarkov processes; formal controller synthesis; continuous-space stochastic systems; DECISION-PROCESSES; VERIFICATION;

D O I：

10.1145/3049797.3049827

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose and analyze a generic mathematical model for optimizing rewards in continuous-space, dynamic environments, called Reward Collecting Markov Processes. Our model is motivated by request-serving applications in robotics, where the objective is to control a dynamical system to respond to stochastically generated environment requests, while minimizing wait times. Our model departs from usual discounted reward Markov decision processes in that the reward function is not determined by the current state and action. Instead, a background process generates rewards whose values depend on the number of steps between generation and collection. For example, a reward is declared whenever there is a new request for a robot and the robot gets higher reward the sooner it is able to serve the request. A policy in this setting is a sequence of control actions which determines a (random) trajectory over the continuous state space. The reward achieved by the trajectory is the cumulative sum of all rewards obtained along the way in the finite horizon case and the long run average of all rewards in the infinite horizon case. We study both the finite horizon and infinite horizon problems for maximizing the expected (respectively, the long run average expected) collected reward. We characterize these problems as solutions to dynamic programs over an augmented hybrid space, which gives history-dependent optimal policies. Second, we provide a computational method for these problems which abstracts the continuous-space problem into a discrete-space collecting reward Markov decision process. Under assumptions of Lipschitz continuity of the Markov process and uniform bounds on the discounting, we show that we can bound the error in computing optimal solutions on the finite-state approximation. Finally, we provide a fixed point characterization of the optimal expected collected reward in the infinite case, and show how the fixed point can be obtained by value iteration.

引用

页码：45 / 54

页数：10

共 50 条

[41] On mean reward variance in semi-Markov processes
Karel Sladký
Mathematical Methods of Operations Research, 2005, 62 : 387 - 397
[42] Loop Estimator for Discounted Values in Markov Reward Processes
Dai, Falcon Z.
Walter, Matthew R.
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7169 - 7175
[43] Simulation-based optimization of Markov reward processes
Marbach, P
Tsitsiklis, JN
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2001, 46 (02) : 191 - 209
[44] Functional Reward Markov Decision Processes: Theory and Applications
Weng, Paul
Spanjaard, Olivier
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2017, 26 (03)
[45] CONVERGING MARKOV DECISION PROCESSES WITH MULTIPLICATIVE REWARD SYSTEM
Fujita T.
Bulletin of the Kyushu Institute of Technology - Pure and Applied Mathematics, 2023, 2023 (70): : 33 - 41
[46] Robust Average-Reward Markov Decision Processes
Wang, Yue
Velasquez, Alvaro
Atia, George
Prater-Bennette, Ashley
Zou, Shaofeng
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15215 - 15223
[47] Simulation-based optimization of Markov reward processes
Marbach, Peter
Tsitsiklis, John N.
Proceedings of the IEEE Conference on Decision and Control, 1998, 3 : 2698 - 2703
[48] Average-Reward Decentralized Markov Decision Processes
Petrik, Marek
Zilberstein, Shlomo
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1997 - 2002
[49] Simulation-based optimization of Markov reward processes
Marbach, P
Tsitsiklis, JN
PROCEEDINGS OF THE 37TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-4, 1998, : 2698 - 2703
[50] On mean reward variance in semi-Markov processes
Sladky, K
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2005, 62 (03) : 387 - 397

← 1 2 3 4 5 →