Entropy Maximization for Partially Observable Markov Decision Processes

被引:2
|
作者
Savas, Yagiz [1 ]
Hibbard, Michael [1 ]
Wu, Bo [1 ]
Tanaka, Takashi [1 ]
Topcu, Ufuk [1 ]
机构
[1] Univ Texas Austin, Oden Inst Computat Engn & Sci, Dept Aerosp Engn & Engn Mech, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
Autonomous systems; entropy; stochastic processes;
D O I
10.1109/TAC.2022.3183564
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of an agent's trajectories to an outside observer while guaranteeing the completion of a task expressed by a reward function. Focusing on finite-state controllers (FSCs) with deterministic memory transitions, we show that the maximum entropy of a POMDP is lower bounded by the maximum entropy of the parameteric Markov chain (pMC) induced by such FSCs. This relationship allows us to recast the entropy maximization problem as a so-called parameter synthesis problem for the induced pMC. We then present an algorithm to synthesize an FSC that locally maximizes the entropy of a POMDP over FSCs with the same number of memory states. In a numerical example, we highlight the benefit of using an entropy-maximizing FSC compared with an FSC that simply finds a feasible policy for accomplishing a task.
引用
收藏
页码:6948 / 6955
页数:8
相关论文
共 50 条
  • [21] An Argument for the Bayesian Control of Partially Observable Markov Decision Processes
    Vargo, Erik
    Cogill, Randy
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (10) : 2796 - 2800
  • [22] Partially observable Markov decision processes for spoken dialog systems
    Williams, Jason D.
    Young, Steve
    [J]. COMPUTER SPEECH AND LANGUAGE, 2007, 21 (02): : 393 - 422
  • [23] Learning deterministic policies in partially observable Markov decision processes
    Miyazaki, K
    Kobayashi, S
    [J]. INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 250 - 257
  • [24] Nonmyopic multiaspect sensing with partially observable Markov decision processes
    Ji, Shihao
    Parr, Ronald
    Carin, Lawrence
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (06) : 2720 - 2730
  • [25] A Fast Approximation Method for Partially Observable Markov Decision Processes
    Bingbing Liu
    Yu Kang
    Xiaofeng Jiang
    Jiahu Qin
    [J]. Journal of Systems Science and Complexity, 2018, 31 : 1423 - 1436
  • [26] STRUCTURAL RESULTS FOR PARTIALLY OBSERVABLE MARKOV DECISION-PROCESSES
    ALBRIGHT, SC
    [J]. OPERATIONS RESEARCH, 1979, 27 (05) : 1041 - 1053
  • [27] Partially Observable Markov Decision Processes incorporating epistemic uncertainties
    Faddoul, R.
    Raphael, W.
    Soubra, A. -H.
    Chateauneuf, A.
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2015, 241 (02) : 391 - 401
  • [28] Qualitative Analysis of Partially-Observable Markov Decision Processes
    Chatterjee, Krishnendu
    Doyen, Laurent
    Henzinger, Thomas A.
    [J]. MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE 2010, 2010, 6281 : 258 - 269
  • [29] MEDICAL TREATMENTS USING PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES
    Goulionis, John E.
    [J]. JP JOURNAL OF BIOSTATISTICS, 2009, 3 (02) : 77 - 97
  • [30] Equivalence Relations in Fully and Partially Observable Markov Decision Processes
    Castro, Pablo Samuel
    Panangaden, Prakash
    Precup, Doina
    [J]. 21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1653 - 1658