Controller Synthesis for Reward Collecting Markov Processes in Continuous Space

被引:3
|
作者
Soudjani, Sadegh Esmaeil Zadeh [1 ]
Majumdar, Rupak [1 ]
机构
[1] Max Planck Inst Software Syst, Kaiserslautern, Germany
关键词
Reward collectingMarkov processes; formal controller synthesis; continuous-space stochastic systems; DECISION-PROCESSES; VERIFICATION;
D O I
10.1145/3049797.3049827
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose and analyze a generic mathematical model for optimizing rewards in continuous-space, dynamic environments, called Reward Collecting Markov Processes. Our model is motivated by request-serving applications in robotics, where the objective is to control a dynamical system to respond to stochastically generated environment requests, while minimizing wait times. Our model departs from usual discounted reward Markov decision processes in that the reward function is not determined by the current state and action. Instead, a background process generates rewards whose values depend on the number of steps between generation and collection. For example, a reward is declared whenever there is a new request for a robot and the robot gets higher reward the sooner it is able to serve the request. A policy in this setting is a sequence of control actions which determines a (random) trajectory over the continuous state space. The reward achieved by the trajectory is the cumulative sum of all rewards obtained along the way in the finite horizon case and the long run average of all rewards in the infinite horizon case. We study both the finite horizon and infinite horizon problems for maximizing the expected (respectively, the long run average expected) collected reward. We characterize these problems as solutions to dynamic programs over an augmented hybrid space, which gives history-dependent optimal policies. Second, we provide a computational method for these problems which abstracts the continuous-space problem into a discrete-space collecting reward Markov decision process. Under assumptions of Lipschitz continuity of the Markov process and uniform bounds on the discounting, we show that we can bound the error in computing optimal solutions on the finite-state approximation. Finally, we provide a fixed point characterization of the optimal expected collected reward in the infinite case, and show how the fixed point can be obtained by value iteration.
引用
收藏
页码:45 / 54
页数:10
相关论文
共 50 条
  • [1] Markov reward models and markov decision processes in discrete and continuous time: Performance evaluation and optimization
    Gouberman, Alexander
    Siegle, Markus
    Gouberman, Alexander (alexander.gouberman@unibw.de), 1600, Springer Verlag (8453): : 156 - 241
  • [2] Quantitative controller synthesis for consumption Markov decision processes
    Fu, Jianling
    Huang, Cheng-Chao
    Li, Yong
    Mei, Jingyi
    Xu, Ming
    Zhang, Lijun
    INFORMATION PROCESSING LETTERS, 2023, 180
  • [3] Qualitative Controller Synthesis for Consumption Markov Decision Processes
    Blahoudek, Frantisek
    Brazdil, Tomas
    Novotny, Petr
    Ornik, Melkior
    Thangeda, Pranay
    Topcu, Ufuk
    COMPUTER AIDED VERIFICATION, PT II, 2020, 12225 : 421 - 447
  • [4] RISK SENSITIVE DISCRETE- AND CONTINUOUS-TIME MARKOV REWARD PROCESSES
    Sladky, Karek
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE QUANTITATIVE METHODS IN ECONOMICS (MULTIPLE CRITERIA DECISION MAKING XIV), 2008, : 272 - 281
  • [5] CONTINUOUS-TIME MARKOV DECISION-PROCESSES WITH NONZERO TERMINAL REWARD
    JO, KY
    NAVAL RESEARCH LOGISTICS, 1984, 31 (02) : 265 - 274
  • [6] Markov Decision Processes with Arbitrary Reward Processes
    Yu, Jia Yuan
    Mannor, Shie
    Shimkin, Nahum
    RECENT ADVANCES IN REINFORCEMENT LEARNING, 2008, 5323 : 268 - +
  • [7] Markov Decision Processes with Arbitrary Reward Processes
    Yu, Jia Yuan
    Mannor, Shie
    Shimkin, Nahum
    MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (03) : 737 - 757
  • [8] Approximate gradient methods in policy-space optimization of Markov reward processes
    Marbach, P
    Tsitsiklis, JN
    DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2003, 13 (1-2): : 111 - 148
  • [9] Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes
    Peter Marbach
    John N. Tsitsiklis
    Discrete Event Dynamic Systems, 2003, 13 : 111 - 148
  • [10] Online Markov Decision Processes Configuration with Continuous Decision Space
    Maran, Davide
    Olivieri, Pierriccardo
    Stradi, Francesco Emanuele
    Urso, Giuseppe
    Gatti, Nicola
    Restelli, Marcello
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14315 - 14322