Entropy Maximization for Partially Observable Markov Decision Processes

被引:2
|
作者
Savas, Yagiz [1 ]
Hibbard, Michael [1 ]
Wu, Bo [1 ]
Tanaka, Takashi [1 ]
Topcu, Ufuk [1 ]
机构
[1] Univ Texas Austin, Oden Inst Computat Engn & Sci, Dept Aerosp Engn & Engn Mech, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
Autonomous systems; entropy; stochastic processes;
D O I
10.1109/TAC.2022.3183564
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of an agent's trajectories to an outside observer while guaranteeing the completion of a task expressed by a reward function. Focusing on finite-state controllers (FSCs) with deterministic memory transitions, we show that the maximum entropy of a POMDP is lower bounded by the maximum entropy of the parameteric Markov chain (pMC) induced by such FSCs. This relationship allows us to recast the entropy maximization problem as a so-called parameter synthesis problem for the induced pMC. We then present an algorithm to synthesize an FSC that locally maximizes the entropy of a POMDP over FSCs with the same number of memory states. In a numerical example, we highlight the benefit of using an entropy-maximizing FSC compared with an FSC that simply finds a feasible policy for accomplishing a task.
引用
收藏
页码:6948 / 6955
页数:8
相关论文
共 50 条
  • [1] Transition Entropy in Partially Observable Markov Decision Processes
    Melo, Francisco S.
    Ribeiro, Isabel
    [J]. INTELLIGENT AUTONOMOUS SYSTEMS 9, 2006, : 282 - +
  • [2] Anderson acceleration for partially observable Markov decision processes: A maximum entropy approach
    Park, Mingyu
    Shin, Jaeuk
    Yang, Insoon
    [J]. AUTOMATICA, 2024, 163
  • [3] Partially Observable Markov Decision Processes and Robotics
    Kurniawati, Hanna
    [J]. ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 5 : 253 - 277
  • [4] Quantum partially observable Markov decision processes
    Barry, Jennifer
    Barry, Daniel T.
    Aaronson, Scott
    [J]. PHYSICAL REVIEW A, 2014, 90 (03):
  • [5] A tutorial on partially observable Markov decision processes
    Littman, Michael L.
    [J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2009, 53 (03) : 119 - 125
  • [6] PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES WITH PARTIALLY OBSERVABLE RANDOM DISCOUNT FACTORS
    Martinez-Garcia, E. Everardo
    Minjarez-Sosa, J. Adolfo
    Vega-Amaya, Oscar
    [J]. KYBERNETIKA, 2022, 58 (06) : 960 - 983
  • [7] Entropy Maximization for Constrained Markov Decision Processes
    Savas, Yagiz
    Ornik, Melkior
    Cubuktepe, Murat
    Topcu, Ufuk
    [J]. 2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 911 - 918
  • [8] Active learning in partially observable Markov decision processes
    Jaulmes, R
    Pineau, J
    Precup, D
    [J]. MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
  • [9] Structural Estimation of Partially Observable Markov Decision Processes
    Chang, Yanling
    Garcia, Alfredo
    Wang, Zhide
    Sun, Lu
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (08) : 5135 - 5141
  • [10] Nonapproximability results for partially observable Markov decision processes
    Lusena, C
    Goldsmith, J
    Mundhenk, M
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 14 : 83 - 113