Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

被引:16
|
作者
Doshi-Velez, Finale [1 ]
Pineau, Joelle [2 ]
Roy, Nicholas [1 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] McGill Univ, Montreal, PQ, Canada
关键词
Partially observable Markov decision process; Reinforcement learning; Bayesian methods; HIDDEN MARKOV-MODELS;
D O I
10.1016/j.artint.2012.04.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially lithe agent's sensors provide only noisy or partial information. In this setting. Partially Observable Markov Decision Processes (POMDPs) provide a planning framework that optimally trades between actions that contribute to the agent's knowledge anti actions that increase the agent's immediate reward. However, the task of specifying the POMDP's parameters is often onerous. In particular, setting the immediate rewards to achieve a desired balance between information-gathering and acting is often not intuitive. In this work, we propose an approximation based on minimizing the immediate Bayes risk for choosing actions when transition, observation, and reward models are uncertain. The Bayes-risk criterion avoids the computational intractability of solving a POMDP with a multi-dimensional continuous state space; we show it performs well in a variety of problems. We use policy queries in which we ask an expert for the correct action to infer the consequences of a potential pitfall without experiencing its effects. More important for human-robot interaction settings, policy queries allow the agent to learn the reward model without the reward values ever being specified. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:115 / 132
页数:18
相关论文
共 50 条
  • [1] Reinforcement Learning in POMDPs Without Resets
    Even-Dar, Eyal
    Kakade, Sham M.
    Mansour, Yishay
    [J]. 19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 690 - 695
  • [2] Deep Variational Reinforcement Learning for POMDPs
    Igl, Maximilian
    Zintgraf, Luisa
    Le, Tuan Anh
    Wood, Frank
    Whiteson, Shimon
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [3] Bayesian Reinforcement Learning in Factored POMDPs
    Katt, Sammie
    Oliehoek, Frans A.
    Amato, Christopher
    [J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 7 - 15
  • [4] On-Robot Bayesian Reinforcement Learning for POMDPs
    Nguyen, Hai
    Katt, Sammie
    Xiao, Yuchen
    Amato, Christopher
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 9480 - 9487
  • [5] Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
    Jin, Chi
    Kakade, Sham M.
    Krishnamurthy, Akshay
    Liu, Qinghua
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [6] Bayesian Reinforcement Learning in Continuous POMDPs with Gaussian Processes
    Dallaire, Patrick
    Besse, Camille
    Ross, Stephane
    Chaib-draa, Brahim
    [J]. 2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 2604 - 2609
  • [7] Risk-Averse Bayes-Adaptive Reinforcement Learning
    Rigter, Marc
    Lacerda, Bruno
    Hawes, Nick
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [8] Memory-based Deep Reinforcement Learning for POMDPs
    Meng, Lingheng
    Gorbet, Rob
    Kulic, Dana
    [J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 5619 - 5626
  • [9] Learning to Label with Active Learning and Reinforcement Learning
    Tang, Xiu
    Wu, Sai
    Chen, Gang
    Chen, Ke
    Shou, Lidan
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 549 - 557
  • [10] Hierarchical Reinforcement Learning Introducing Genetic Algorithm for POMDPs Environments
    Suzuki, Kohei
    Kato, Shohei
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 318 - 327