Entropy Maximization for Partially Observable Markov Decision Processes

被引：2

作者：

Savas, Yagiz ^{[1
]}

Hibbard, Michael ^{[1
]}

Wu, Bo ^{[1
]}

Tanaka, Takashi ^{[1
]}

Topcu, Ufuk ^{[1
]}

机构：

[1] Univ Texas Austin, Oden Inst Computat Engn & Sci, Dept Aerosp Engn & Engn Mech, Austin, TX 78712 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2022年 / 67卷 / 12期

基金：

美国国家科学基金会;

关键词：

Autonomous systems; entropy; stochastic processes;

D O I：

10.1109/TAC.2022.3183564

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of an agent's trajectories to an outside observer while guaranteeing the completion of a task expressed by a reward function. Focusing on finite-state controllers (FSCs) with deterministic memory transitions, we show that the maximum entropy of a POMDP is lower bounded by the maximum entropy of the parameteric Markov chain (pMC) induced by such FSCs. This relationship allows us to recast the entropy maximization problem as a so-called parameter synthesis problem for the induced pMC. We then present an algorithm to synthesize an FSC that locally maximizes the entropy of a POMDP over FSCs with the same number of memory states. In a numerical example, we highlight the benefit of using an entropy-maximizing FSC compared with an FSC that simply finds a feasible policy for accomplishing a task.

引用

页码：6948 / 6955

页数：8

共 50 条

[1] Transition Entropy in Partially Observable Markov Decision Processes
Melo, Francisco S.
Ribeiro, Isabel
[J]. INTELLIGENT AUTONOMOUS SYSTEMS 9, 2006, : 282 - +
[2] Anderson acceleration for partially observable Markov decision processes: A maximum entropy approach
Park, Mingyu
Shin, Jaeuk
Yang, Insoon
[J]. AUTOMATICA, 2024, 163
[3] Partially Observable Markov Decision Processes and Robotics
Kurniawati, Hanna
[J]. ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 5 : 253 - 277
[4] Quantum partially observable Markov decision processes
Barry, Jennifer
Barry, Daniel T.
Aaronson, Scott
[J]. PHYSICAL REVIEW A, 2014, 90 (03):
[5] A tutorial on partially observable Markov decision processes
Littman, Michael L.
[J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2009, 53 (03) : 119 - 125
[6] PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES WITH PARTIALLY OBSERVABLE RANDOM DISCOUNT FACTORS
Martinez-Garcia, E. Everardo
Minjarez-Sosa, J. Adolfo
Vega-Amaya, Oscar
[J]. KYBERNETIKA, 2022, 58 (06) : 960 - 983
[7] Entropy Maximization for Constrained Markov Decision Processes
Savas, Yagiz
Ornik, Melkior
Cubuktepe, Murat
Topcu, Ufuk
[J]. 2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 911 - 918
[8] Active learning in partially observable Markov decision processes
Jaulmes, R
Pineau, J
Precup, D
[J]. MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
[9] Structural Estimation of Partially Observable Markov Decision Processes
Chang, Yanling
Garcia, Alfredo
Wang, Zhide
Sun, Lu
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (08) : 5135 - 5141
[10] Nonapproximability results for partially observable Markov decision processes
Lusena, C
Goldsmith, J
Mundhenk, M
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 14 : 83 - 113

← 1 2 3 4 5 →