Policy Reuse for Learning and Planning in Partially Observable Markov Decision Processes

被引:5
|
作者
Wu, Bo [1 ]
Feng, Yanpeng [1 ]
机构
[1] Shenzhen Polytech, Educ Technol & Informat Ctr, Shenzhen, Peoples R China
关键词
partially observable Markov decision processes; reinforcement learning; policy reuse;
D O I
10.1109/ICISCE.2017.120
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning and planning in partially bservable Markov decision processes (POMDPs) is computationally intractable in real-time system. In order to address this problem, this paper proposes a belief policy reuse (BPR) method to avoid repeated computation. Firstly, the policy reuse evaluation mechanism based on belief Kullback Leibler divergence is presented as a similarity metric between beliefs in the belief-policy library. If the current belief is similar to any of the past ones, the policy in the belief-policy library is reused. Otherwise, BPR exploits Monte-Carlo particle method to explore a new policy, and stores the new policy with belief in the belief-policy library, so it can be reused in the future. The experimental results show that the proposed approach is an effective way for improving the learning efficiency in large-scale partially observable Markov decision processes.
引用
收藏
页码:549 / 552
页数:4
相关论文
共 50 条
  • [1] A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
    Ross, Stephane
    Pineau, Joelle
    Chaib-draa, Brahim
    Kreitmann, Pierre
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 1729 - 1770
  • [2] Active learning in partially observable Markov decision processes
    Jaulmes, R
    Pineau, J
    Precup, D
    [J]. MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
  • [3] Learning deterministic policies in partially observable Markov decision processes
    Miyazaki, K
    Kobayashi, S
    [J]. INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 250 - 257
  • [4] Learning factored representations for partially observable Markov decision processes
    Sallans, B
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1050 - 1056
  • [5] Partially Observable Markov Decision Processes and Robotics
    Kurniawati, Hanna
    [J]. ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 5 : 253 - 277
  • [6] A tutorial on partially observable Markov decision processes
    Littman, Michael L.
    [J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2009, 53 (03) : 119 - 125
  • [7] Quantum partially observable Markov decision processes
    Barry, Jennifer
    Barry, Daniel T.
    Aaronson, Scott
    [J]. PHYSICAL REVIEW A, 2014, 90 (03):
  • [8] Recursive learning automata for control of partially observable Markov decision processes
    Chang, Hyeong Soo
    Fu, Michael C.
    Marcus, Steven I.
    [J]. 2005 44TH IEEE CONFERENCE ON DECISION AND CONTROL & EUROPEAN CONTROL CONFERENCE, VOLS 1-8, 2005, : 6091 - 6096
  • [9] Planning treatment of ischemic heart disease with partially observable Markov decision processes
    Hauskrecht, M
    Fraser, H
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2000, 18 (03) : 221 - 244
  • [10] PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES WITH PARTIALLY OBSERVABLE RANDOM DISCOUNT FACTORS
    Martinez-Garcia, E. Everardo
    Minjarez-Sosa, J. Adolfo
    Vega-Amaya, Oscar
    [J]. KYBERNETIKA, 2022, 58 (06) : 960 - 983