Online Markov Decision Processes With Kullback-Leibler Control Cost

被引：34

作者：

Guan, Peng ^{[1
]}

Raginsky, Maxim ^{[2
,3
]}

Willett, Rebecca M. ^{[4
]}

机构：

[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA

[2] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

[3] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA

[4] Univ Wisconsin, Dept Elect & Comp Engn, Madison, WI 53796 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2014年 / 59卷 / 06期

基金：

美国国家科学基金会;

关键词：

Markov decision processes; online learning; stochastic control; RISK-SENSITIVE CONTROL; STOCHASTIC UNCERTAIN SYSTEMS;

D O I：

10.1109/TAC.2014.2301558

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper considers an online (real-time) control problem that involves an agent performing a discrete-time random walk over a finite state space. The agent's action at each time step is to specify the probability distribution for the next state given the current state. Following the setup of Todorov, the state-action cost at each time step is a sum of a state cost and a control cost given by the Kullback-Leibler (KL) divergence between the agent's next-state distribution and that determined by some fixed passive dynamics. The online aspect of the problem is due to the fact that the state cost functions are generated by a dynamic environment, and the agent learns the current state cost only after selecting an action. An explicit construction of a computationally efficient strategy with small regret (i.e., expected difference between its actual total cost and the smallest cost attainable using noncausal knowledge of the state costs) under mild regularity conditions is presented, along with a demonstration of the performance of the proposed strategy on a simulated target tracking problem. A number of new results on Markov decision processes with KL control cost are also obtained.

引用

页码：1423 / 1438

页数：16

共 50 条

[1] Online Markov Decision Processes with Kullback-Leibler Control Cost
Guan, Peng
Raginsky, Maxim
Willett, Rebecca
[J]. 2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 1388 - 1393
[2] ORDINARY DIFFERENTIAL EQUATION METHODS FOR MARKOV DECISION PROCESSES AND APPLICATION TO KULLBACK-LEIBLER CONTROL COST
Busic, Ana
Meyn, Sean
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2018, 56 (01) : 343 - 366
[3] Fundamental Performance Limitations with Kullback-Leibler Control Cost
Sun, Yu
Mehta, Prashant G.
[J]. 49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 7063 - 7068
[4] A decision cognizant Kullback-Leibler divergence
Ponti, Moacir
Kittler, Josef
Riva, Mateus
de Campos, Teofilo
Zor, Cemre
[J]. PATTERN RECOGNITION, 2017, 61 : 470 - 478
[5] A Generalized Framework For Kullback-Leibler Markov Aggregation
Amjad, Rana Ali
Blochl, Clemens
Geiger, Bernhard C.
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (07) : 3068 - 3075
[6] Kullback-Leibler Control in Boolean Control Networks
Toyoda, Mitsuru
Wu, Yuhu
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (08) : 4429 - 4442
[7] The Kullback-Leibler divergence rate between Markov sources
Rached, Z
Alajaji, F
Campbell, LL
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2004, 50 (05) : 917 - 921
[8] Variational Kullback-Leibler divergence for hidden Markov models
Hershey, John R.
Olsen, Peder A.
Rennie, Steven J.
[J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 323 - 328
[9] THE KULLBACK-LEIBLER DISTANCE
KULLBACK, S
[J]. AMERICAN STATISTICIAN, 1987, 41 (04): : 340 - 340
[10] The Kullback-Leibler autodependogram
Bagnato, L.
De Capitani, L.
Punzo, A.
[J]. JOURNAL OF APPLIED STATISTICS, 2016, 43 (14) : 2574 - 2594

← 1 2 3 4 5 →