Active learning in partially observable Markov decision processes

被引:0
|
作者
Jaulmes, R [1 ]
Pineau, J [1 ]
Precup, D [1 ]
机构
[1] McGill Univ, Sch Comp Sci, Montreal, PQ H3A 2A7, Canada
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper examines the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not known or is only poorly specified. We propose two approaches to this problem. The first relies on a model of the uncertainty that is added directly into the POMDP planning problem. This has theoretical guarantees, but is impractical when many of the parameters are uncertain. The second, called MEDUSA, incrementally improves the POMDP model using selected queries, while still optimizing reward. Results show good performance of the algorithm even in large problems: the most useful parameters of the model are learned quickly and the agent still accumulates high reward throughout the process.
引用
收藏
页码:601 / 608
页数:8
相关论文
共 50 条
  • [1] Active Chemical Sensing With Partially Observable Markov Decision Processes
    Gosangi, Rakesh
    Gutierrez-Osuna, Ricardo
    [J]. OLFACTION AND ELECTRONIC NOSE, PROCEEDINGS, 2009, 1137 : 562 - 565
  • [2] Learning deterministic policies in partially observable Markov decision processes
    Miyazaki, K
    Kobayashi, S
    [J]. INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 250 - 257
  • [3] Learning factored representations for partially observable Markov decision processes
    Sallans, B
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1050 - 1056
  • [4] Partially Observable Markov Decision Processes and Robotics
    Kurniawati, Hanna
    [J]. ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 5 : 253 - 277
  • [5] Quantum partially observable Markov decision processes
    Barry, Jennifer
    Barry, Daniel T.
    Aaronson, Scott
    [J]. PHYSICAL REVIEW A, 2014, 90 (03):
  • [6] A tutorial on partially observable Markov decision processes
    Littman, Michael L.
    [J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2009, 53 (03) : 119 - 125
  • [7] A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
    Ross, Stephane
    Pineau, Joelle
    Chaib-draa, Brahim
    Kreitmann, Pierre
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 1729 - 1770
  • [8] Recursive learning automata for control of partially observable Markov decision processes
    Chang, Hyeong Soo
    Fu, Michael C.
    Marcus, Steven I.
    [J]. 2005 44TH IEEE CONFERENCE ON DECISION AND CONTROL & EUROPEAN CONTROL CONFERENCE, VOLS 1-8, 2005, : 6091 - 6096
  • [9] Policy Reuse for Learning and Planning in Partially Observable Markov Decision Processes
    Wu, Bo
    Feng, Yanpeng
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 549 - 552
  • [10] PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES WITH PARTIALLY OBSERVABLE RANDOM DISCOUNT FACTORS
    Martinez-Garcia, E. Everardo
    Minjarez-Sosa, J. Adolfo
    Vega-Amaya, Oscar
    [J]. KYBERNETIKA, 2022, 58 (06) : 960 - 983