A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

被引:0
|
作者
Ross, Stephane [1 ]
Pineau, Joelle [2 ]
Chaib-draa, Brahim [3 ]
Kreitmann, Pierre [4 ]
机构
[1] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA
[2] McGill Univ, Sch Comp Sci, Montreal, PQ H3A 2A7, Canada
[3] Univ Laval, Comp Sci & Software Engn Dept, Quebec City, PQ G1K 7P4, Canada
[4] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
基金
加拿大自然科学与工程研究理事会; 美国国家卫生研究院;
关键词
reinforcement learning; Bayesian inference; partially observable Markov decision processes;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian learning methods have recently been shown to provide an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). The primary focus of this paper is to extend these ideas to the case of partially observable domains, by introducing the Bayes-Adaptive Partially Observable Markov Decision Processes. This new framework can be used to simultaneously (1) learn a model of the POMDP domain through interaction with the environment, (2) track the state of the system under partial observability, and (3) plan (near-)optimal sequences of actions. An important contribution of this paper is to provide theoretical results showing how the model can be finitely approximated while preserving good learning performance. We present approximate algorithms for belief tracking and planning in this model, as well as empirical results that illustrate how the model estimate and agent's return improve as a function of experience.
引用
收藏
页码:1729 / 1770
页数:42
相关论文
共 50 条
  • [31] THE PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES FRAMEWORK IN MEDICAL DECISION MAKING
    Goulionis, John E.
    Stengos, Dimitrios I.
    ADVANCES AND APPLICATIONS IN STATISTICS, 2008, 9 (02) : 205 - 232
  • [32] A pulse neural network reinforcement learning algorithm for partially observable Markov decision processes
    Takita, Koichiro
    Hagiwara, Masafumi
    Systems and Computers in Japan, 2005, 36 (03): : 42 - 52
  • [33] LEARNING PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES USING COUPLED CANONICAL POLYADIC DECOMPOSITION
    Huang, Kejun
    Yang, Zhuoran
    Wang, Zhaoran
    Hong, Mingyi
    2019 IEEE DATA SCIENCE WORKSHOP (DSW), 2019, : 295 - 299
  • [34] Guided Soft Actor Critic: A Guided Deep Reinforcement Learning Approach for Partially Observable Markov Decision Processes
    Haklidir, Mehmet
    Temeltas, Hakan
    IEEE ACCESS, 2021, 9 : 159672 - 159683
  • [35] Experimental results on learning stochastic memoryless policies for Partially Observable Markov Decision Processes
    Williams, JK
    Singh, S
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 11, 1999, 11 : 1073 - 1079
  • [36] PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES AND PERIODIC POLICIES WITH APPLICATIONS
    Goulionis, John
    Stengos, D.
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2011, 10 (06) : 1175 - 1197
  • [37] Nonmyopic multiaspect sensing with partially observable Markov decision processes
    Ji, Shihao
    Parr, Ronald
    Carin, Lawrence
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (06) : 2720 - 2730
  • [38] Partially observable Markov decision processes for spoken dialog systems
    Williams, Jason D.
    Young, Steve
    COMPUTER SPEECH AND LANGUAGE, 2007, 21 (02): : 393 - 422
  • [39] A Fast Approximation Method for Partially Observable Markov Decision Processes
    Bingbing Liu
    Yu Kang
    Xiaofeng Jiang
    Jiahu Qin
    Journal of Systems Science and Complexity, 2018, 31 : 1423 - 1436
  • [40] STRUCTURAL RESULTS FOR PARTIALLY OBSERVABLE MARKOV DECISION-PROCESSES
    ALBRIGHT, SC
    OPERATIONS RESEARCH, 1979, 27 (05) : 1041 - 1053