A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

被引:0
|
作者
Ross, Stephane [1 ]
Pineau, Joelle [2 ]
Chaib-draa, Brahim [3 ]
Kreitmann, Pierre [4 ]
机构
[1] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA
[2] McGill Univ, Sch Comp Sci, Montreal, PQ H3A 2A7, Canada
[3] Univ Laval, Comp Sci & Software Engn Dept, Quebec City, PQ G1K 7P4, Canada
[4] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
基金
加拿大自然科学与工程研究理事会; 美国国家卫生研究院;
关键词
reinforcement learning; Bayesian inference; partially observable Markov decision processes;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian learning methods have recently been shown to provide an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). The primary focus of this paper is to extend these ideas to the case of partially observable domains, by introducing the Bayes-Adaptive Partially Observable Markov Decision Processes. This new framework can be used to simultaneously (1) learn a model of the POMDP domain through interaction with the environment, (2) track the state of the system under partial observability, and (3) plan (near-)optimal sequences of actions. An important contribution of this paper is to provide theoretical results showing how the model can be finitely approximated while preserving good learning performance. We present approximate algorithms for belief tracking and planning in this model, as well as empirical results that illustrate how the model estimate and agent's return improve as a function of experience.
引用
收藏
页码:1729 / 1770
页数:42
相关论文
共 50 条
  • [21] Transition Entropy in Partially Observable Markov Decision Processes
    Melo, Francisco S.
    Ribeiro, Isabel
    INTELLIGENT AUTONOMOUS SYSTEMS 9, 2006, : 282 - +
  • [22] Partially observable Markov decision processes with reward information
    Cao, XR
    Guo, XP
    2004 43RD IEEE CONFERENCE ON DECISION AND CONTROL (CDC), VOLS 1-5, 2004, : 4393 - 4398
  • [23] Partially Observable Markov Decision Processes in Robotics: A Survey
    Lauri, Mikko
    Hsu, David
    Pajarinen, Joni
    IEEE TRANSACTIONS ON ROBOTICS, 2023, 39 (01) : 21 - 40
  • [24] A primer on partially observable Markov decision processes (POMDPs)
    Chades, Iadine
    Pascal, Luz V.
    Nicol, Sam
    Fletcher, Cameron S.
    Ferrer-Mestres, Jonathan
    METHODS IN ECOLOGY AND EVOLUTION, 2021, 12 (11): : 2058 - 2072
  • [25] Minimal Disclosure in Partially Observable Markov Decision Processes
    Bertrand, Nathalie
    Genest, Blaise
    IARCS ANNUAL CONFERENCE ON FOUNDATIONS OF SOFTWARE TECHNOLOGY AND THEORETICAL COMPUTER SCIENCE (FSTTCS 2011), 2011, 13 : 411 - 422
  • [26] Partially observable Markov decision processes with imprecise parameters
    Itoh, Hideaki
    Nakamura, Kiyohiko
    ARTIFICIAL INTELLIGENCE, 2007, 171 (8-9) : 453 - 490
  • [27] Nonapproximability results for partially observable Markov decision processes
    Lusena, Cristopher
    Goldsmith, Judy
    Mundhenk, Martin
    1600, Morgan Kaufmann Publishers (14):
  • [28] A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes
    Shi, Chengchun
    Uehara, Masatoshi
    Huang, Jiawei
    Jiang, Nan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [29] Anderson acceleration for partially observable Markov decision processes: A maximum entropy approach
    Park, Mingyu
    Shin, Jaeuk
    Yang, Insoon
    AUTOMATICA, 2024, 163
  • [30] Computing a Mechanism for a Bayesian and Partially Observable Markov Approach
    Clempner, Julio B.
    Poznyak, Alexander S.
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2023, 33 (03) : 463 - 478