Distributed optimization of Markov reward processes

被引：0

作者：

Campos-Nane, Enrique ^{[1
]}

机构：

[1] George Washington Univ, Dept Engn Management & Syst Engn, Washington, DC 20052 USA

来源：

PROCEEDINGS OF THE 46TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14 | 2007年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Dynamic programming provides perhaps the most natural way to model many control problems, but suffers from the fact that existing solution procedures do not scale gracefully with the size of the problem. In this work, we present a gradient-based policy search technique that exploits the fact that in many applications the state space and control actions are naturally distributed. After presenting our modeling assumptions, we introduce a technique in which a set of distributed agents compute an estimate of the partial derivative of a system-wide objective with respect to the parameters under their control and use it in a gradient-based policy search procedure. We illustrate the algorithm with an application to energy-efficient coverage in energy harvesting sensor networks. The resulting algorithm can be implemented using only local information available to the sensors, and is therefore fully scalable. Our numerical results are encouraging and allow us to conjecture the usefulness of our approach.

引用

下载

页码：3921 / 3926

页数：6

共 50 条

[31] Risk-Sensitivity and Average Optimality in Markov and Semi-Markov Reward Processes
Sladky, Karel
38TH INTERNATIONAL CONFERENCE ON MATHEMATICAL METHODS IN ECONOMICS (MME 2020), 2020, : 537 - 543
[32] Learning and Planning in Average-Reward Markov Decision Processes
Wan, Yi
Naik, Abhishek
Sutton, Richard S.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7665 - 7676
[33] On the Expected Total Reward with Unbounded Returns for Markov Decision Processes
F. Dufour
A. Genadot
Applied Mathematics & Optimization, 2020, 82 : 433 - 450
[34] Controller Synthesis for Reward Collecting Markov Processes in Continuous Space
Soudjani, Sadegh Esmaeil Zadeh
Majumdar, Rupak
PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON HYBRID SYSTEMS: COMPUTATION AND CONTROL (PART OF CPS WEEK) (HSCC' 17), 2017, : 45 - 54
[35] Perceptive evaluation for the optimal discounted reward in Markov decision processes
Kurano, M
Yasuda, M
Nakagami, J
Yoshida, Y
MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3558 : 283 - 293
[36] REVERSIBLE MARKOV DECISION PROCESSES WITH AN AVERAGE-REWARD CRITERION
Cogill, Randy
Peng, Cheng
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2013, 51 (01) : 402 - 418
[37] On the Expected Total Reward with Unbounded Returns for Markov Decision Processes
Dufour, F.
Genadot, A.
APPLIED MATHEMATICS AND OPTIMIZATION, 2020, 82 (02): : 433 - 450
[38] Technical Note: On Ordinal Comparison of Policies in Markov Reward Processes
H. S. Chang
Journal of Optimization Theory and Applications, 2004, 122 : 207 - 217
[39] Bounded parameter Markov decision processes with average reward criterion
Tewari, Ambuj
Bartlett, Peter L.
LEARNING THEORY, PROCEEDINGS, 2007, 4539 : 263 - +
[40] PERFORMABILITY ANALYSIS USING SEMI-MARKOV REWARD PROCESSES
CIARDO, G
MARIE, RA
SERICOLA, B
TRIVEDI, KS
IEEE TRANSACTIONS ON COMPUTERS, 1990, 39 (10) : 1251 - 1264

← 1 2 3 4 5 →