Online Markov Decision Processes With Kullback-Leibler Control Cost

被引:34
|
作者
Guan, Peng [1 ]
Raginsky, Maxim [2 ,3 ]
Willett, Rebecca M. [4 ]
机构
[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA
[2] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
[3] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA
[4] Univ Wisconsin, Dept Elect & Comp Engn, Madison, WI 53796 USA
基金
美国国家科学基金会;
关键词
Markov decision processes; online learning; stochastic control; RISK-SENSITIVE CONTROL; STOCHASTIC UNCERTAIN SYSTEMS;
D O I
10.1109/TAC.2014.2301558
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper considers an online (real-time) control problem that involves an agent performing a discrete-time random walk over a finite state space. The agent's action at each time step is to specify the probability distribution for the next state given the current state. Following the setup of Todorov, the state-action cost at each time step is a sum of a state cost and a control cost given by the Kullback-Leibler (KL) divergence between the agent's next-state distribution and that determined by some fixed passive dynamics. The online aspect of the problem is due to the fact that the state cost functions are generated by a dynamic environment, and the agent learns the current state cost only after selecting an action. An explicit construction of a computationally efficient strategy with small regret (i.e., expected difference between its actual total cost and the smallest cost attainable using noncausal knowledge of the state costs) under mild regularity conditions is presented, along with a demonstration of the performance of the proposed strategy on a simulated target tracking problem. A number of new results on Markov decision processes with KL control cost are also obtained.
引用
收藏
页码:1423 / 1438
页数:16
相关论文
共 50 条
  • [21] Nonparametric Infinite Horizon Kullback-Leibler Stochastic Control
    Pan, Yunpeng
    Theodorou, Evangelos A.
    [J]. 2014 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2014, : 63 - 70
  • [22] SPEAKER ADAPTIVE KULLBACK-LEIBLER DIVERGENCE BASED HIDDEN MARKOV MODELS
    Imseng, David
    Bourlard, Herve
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7913 - 7917
  • [23] Bounds on the Kullback-Leibler divergence rate between hidden Markov models
    Vidyasagar, M.
    [J]. PROCEEDINGS OF THE 46TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2007, : 2196 - 2201
  • [24] Kullback-Leibler divergence and Markov random fields for speckled image restoration
    Bratsolis, E
    Sigelle, M
    [J]. SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 1, PROCEEDINGS, 2003, : 425 - 428
  • [25] Kullback-Leibler Divergence Revisited
    Raiber, Fiana
    Kurland, Oren
    [J]. ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 117 - 124
  • [26] On the Interventional Kullback-Leibler Divergence
    Wildberger, Jonas
    Guo, Siyuan
    Bhattacharyya, Arnab
    Schoelkopf, Bernhard
    [J]. CONFERENCE ON CAUSAL LEARNING AND REASONING, VOL 213, 2023, 213 : 328 - 349
  • [27] Optimal Kullback-Leibler Aggregation via Spectral Theory of Markov Chains
    Deng, Kun
    Mehta, Prashant G.
    Meyn, Sean P.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2011, 56 (12) : 2787 - 2802
  • [28] Markov-switching model selection using Kullback-Leibler divergence
    Smith, Aaron
    Naik, Prasad A.
    Tsai, Chih-Ling
    [J]. JOURNAL OF ECONOMETRICS, 2006, 134 (02) : 553 - 577
  • [29] Upper bound Kullback-Leibler divergence for transient hidden Markov models
    Silva, Jorge
    Narayanan, Shrikanth
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2008, 56 (09) : 4176 - 4188
  • [30] Model Fusion with Kullback-Leibler Divergence
    Claici, Sebastian
    Yurochkin, Mikhail
    Ghosh, Soumya
    Solomon, Justin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119