A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

被引：0

作者：

Ross, Stephane ^{[1
]}

Pineau, Joelle ^{[2
]}

Chaib-draa, Brahim ^{[3
]}

Kreitmann, Pierre ^{[4
]}

机构：

[1] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA

[2] McGill Univ, Sch Comp Sci, Montreal, PQ H3A 2A7, Canada

[3] Univ Laval, Comp Sci & Software Engn Dept, Quebec City, PQ G1K 7P4, Canada

[4] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2011年 / 12卷

基金：

加拿大自然科学与工程研究理事会; 美国国家卫生研究院;

关键词：

reinforcement learning; Bayesian inference; partially observable Markov decision processes;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Bayesian learning methods have recently been shown to provide an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). The primary focus of this paper is to extend these ideas to the case of partially observable domains, by introducing the Bayes-Adaptive Partially Observable Markov Decision Processes. This new framework can be used to simultaneously (1) learn a model of the POMDP domain through interaction with the environment, (2) track the state of the system under partial observability, and (3) plan (near-)optimal sequences of actions. An important contribution of this paper is to provide theoretical results showing how the model can be finitely approximated while preserving good learning performance. We present approximate algorithms for belief tracking and planning in this model, as well as empirical results that illustrate how the model estimate and agent's return improve as a function of experience.

引用

页码：1729 / 1770

页数：42

共 50 条

[1] Policy Reuse for Learning and Planning in Partially Observable Markov Decision Processes
Wu, Bo
Feng, Yanpeng
2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 549 - 552
[2] An Argument for the Bayesian Control of Partially Observable Markov Decision Processes
Vargo, Erik
Cogill, Randy
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (10) : 2796 - 2800
[3] Active learning in partially observable Markov decision processes
Jaulmes, R
Pineau, J
Precup, D
MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
[4] Learning deterministic policies in partially observable Markov decision processes
Miyazaki, K
Kobayashi, S
INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 250 - 257
[5] Reinforcement learning algorithm for partially observable Markov decision processes
Wang, Xue-Ning
He, Han-Gen
Xu, Xin
Kongzhi yu Juece/Control and Decision, 2004, 19 (11): : 1263 - 1266
[6] Learning factored representations for partially observable Markov decision processes
Sallans, B
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1050 - 1056
[7] Partially Observable Markov Decision Processes and Robotics
Kurniawati, Hanna
ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 5 : 253 - 277
[8] Quantum partially observable Markov decision processes
Barry, Jennifer
Barry, Daniel T.
Aaronson, Scott
PHYSICAL REVIEW A, 2014, 90 (03):
[9] A tutorial on partially observable Markov decision processes
Littman, Michael L.
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2009, 53 (03) : 119 - 125
[10] Recursive learning automata for control of partially observable Markov decision processes
Chang, Hyeong Soo
Fu, Michael C.
Marcus, Steven I.
2005 44TH IEEE CONFERENCE ON DECISION AND CONTROL & EUROPEAN CONTROL CONFERENCE, VOLS 1-8, 2005, : 6091 - 6096

← 1 2 3 4 5 →