Distributed optimization of Markov reward processes

被引:0
|
作者
Campos-Nane, Enrique [1 ]
机构
[1] George Washington Univ, Dept Engn Management & Syst Engn, Washington, DC 20052 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dynamic programming provides perhaps the most natural way to model many control problems, but suffers from the fact that existing solution procedures do not scale gracefully with the size of the problem. In this work, we present a gradient-based policy search technique that exploits the fact that in many applications the state space and control actions are naturally distributed. After presenting our modeling assumptions, we introduce a technique in which a set of distributed agents compute an estimate of the partial derivative of a system-wide objective with respect to the parameters under their control and use it in a gradient-based policy search procedure. We illustrate the algorithm with an application to energy-efficient coverage in energy harvesting sensor networks. The resulting algorithm can be implemented using only local information available to the sensors, and is therefore fully scalable. Our numerical results are encouraging and allow us to conjecture the usefulness of our approach.
引用
下载
收藏
页码:3921 / 3926
页数:6
相关论文
共 50 条
  • [1] Adaptive optimization of Markov reward processes
    Campos-Nanez, Enrique
    Patek, Stephen D.
    2005 44TH IEEE CONFERENCE ON DECISION AND CONTROL & EUROPEAN CONTROL CONFERENCE, VOLS 1-8, 2005, : 8034 - 8041
  • [2] Simulation-based optimization of Markov reward processes
    Marbach, P
    Tsitsiklis, JN
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2001, 46 (02) : 191 - 209
  • [3] Simulation-based optimization of Markov reward processes
    Marbach, Peter
    Tsitsiklis, John N.
    Proceedings of the IEEE Conference on Decision and Control, 1998, 3 : 2698 - 2703
  • [4] Simulation-based optimization of Markov reward processes
    Marbach, P
    Tsitsiklis, JN
    PROCEEDINGS OF THE 37TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-4, 1998, : 2698 - 2703
  • [5] Markov reward models and markov decision processes in discrete and continuous time: Performance evaluation and optimization
    Gouberman, Alexander
    Siegle, Markus
    Gouberman, Alexander (alexander.gouberman@unibw.de), 1600, Springer Verlag (8453): : 156 - 241
  • [6] Markov Decision Processes with Arbitrary Reward Processes
    Yu, Jia Yuan
    Mannor, Shie
    Shimkin, Nahum
    RECENT ADVANCES IN REINFORCEMENT LEARNING, 2008, 5323 : 268 - +
  • [7] Markov Decision Processes with Arbitrary Reward Processes
    Yu, Jia Yuan
    Mannor, Shie
    Shimkin, Nahum
    MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (03) : 737 - 757
  • [8] Optimal control in Markov decision processes via distributed optimization
    Fu, Jie
    Han, Shuo
    Topcu, Ufuk
    2015 54TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2015, : 7462 - 7469
  • [9] Approximate gradient methods in policy-space optimization of Markov reward processes
    Marbach, P
    Tsitsiklis, JN
    DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2003, 13 (1-2): : 111 - 148
  • [10] Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes
    Peter Marbach
    John N. Tsitsiklis
    Discrete Event Dynamic Systems, 2003, 13 : 111 - 148