Distributed optimization of Markov reward processes

被引:0
|
作者
Campos-Nane, Enrique [1 ]
机构
[1] George Washington Univ, Dept Engn Management & Syst Engn, Washington, DC 20052 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dynamic programming provides perhaps the most natural way to model many control problems, but suffers from the fact that existing solution procedures do not scale gracefully with the size of the problem. In this work, we present a gradient-based policy search technique that exploits the fact that in many applications the state space and control actions are naturally distributed. After presenting our modeling assumptions, we introduce a technique in which a set of distributed agents compute an estimate of the partial derivative of a system-wide objective with respect to the parameters under their control and use it in a gradient-based policy search procedure. We illustrate the algorithm with an application to energy-efficient coverage in energy harvesting sensor networks. The resulting algorithm can be implemented using only local information available to the sensors, and is therefore fully scalable. Our numerical results are encouraging and allow us to conjecture the usefulness of our approach.
引用
下载
收藏
页码:3921 / 3926
页数:6
相关论文
共 50 条
  • [31] Risk-Sensitivity and Average Optimality in Markov and Semi-Markov Reward Processes
    Sladky, Karel
    38TH INTERNATIONAL CONFERENCE ON MATHEMATICAL METHODS IN ECONOMICS (MME 2020), 2020, : 537 - 543
  • [32] Learning and Planning in Average-Reward Markov Decision Processes
    Wan, Yi
    Naik, Abhishek
    Sutton, Richard S.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7665 - 7676
  • [33] On the Expected Total Reward with Unbounded Returns for Markov Decision Processes
    F. Dufour
    A. Genadot
    Applied Mathematics & Optimization, 2020, 82 : 433 - 450
  • [34] Controller Synthesis for Reward Collecting Markov Processes in Continuous Space
    Soudjani, Sadegh Esmaeil Zadeh
    Majumdar, Rupak
    PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON HYBRID SYSTEMS: COMPUTATION AND CONTROL (PART OF CPS WEEK) (HSCC' 17), 2017, : 45 - 54
  • [35] Perceptive evaluation for the optimal discounted reward in Markov decision processes
    Kurano, M
    Yasuda, M
    Nakagami, J
    Yoshida, Y
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3558 : 283 - 293
  • [36] REVERSIBLE MARKOV DECISION PROCESSES WITH AN AVERAGE-REWARD CRITERION
    Cogill, Randy
    Peng, Cheng
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2013, 51 (01) : 402 - 418
  • [37] On the Expected Total Reward with Unbounded Returns for Markov Decision Processes
    Dufour, F.
    Genadot, A.
    APPLIED MATHEMATICS AND OPTIMIZATION, 2020, 82 (02): : 433 - 450
  • [38] Technical Note: On Ordinal Comparison of Policies in Markov Reward Processes
    H. S. Chang
    Journal of Optimization Theory and Applications, 2004, 122 : 207 - 217
  • [39] Bounded parameter Markov decision processes with average reward criterion
    Tewari, Ambuj
    Bartlett, Peter L.
    LEARNING THEORY, PROCEEDINGS, 2007, 4539 : 263 - +
  • [40] PERFORMABILITY ANALYSIS USING SEMI-MARKOV REWARD PROCESSES
    CIARDO, G
    MARIE, RA
    SERICOLA, B
    TRIVEDI, KS
    IEEE TRANSACTIONS ON COMPUTERS, 1990, 39 (10) : 1251 - 1264