Policy Reuse for Learning and Planning in Partially Observable Markov Decision Processes

被引：5

作者：

Wu, Bo ^{[1
]}

Feng, Yanpeng ^{[1
]}

机构：

[1] Shenzhen Polytech, Educ Technol & Informat Ctr, Shenzhen, Peoples R China

来源：

2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE) | 2017年

关键词：

partially observable Markov decision processes; reinforcement learning; policy reuse;

D O I：

10.1109/ICISCE.2017.120

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Learning and planning in partially bservable Markov decision processes (POMDPs) is computationally intractable in real-time system. In order to address this problem, this paper proposes a belief policy reuse (BPR) method to avoid repeated computation. Firstly, the policy reuse evaluation mechanism based on belief Kullback Leibler divergence is presented as a similarity metric between beliefs in the belief-policy library. If the current belief is similar to any of the past ones, the policy in the belief-policy library is reused. Otherwise, BPR exploits Monte-Carlo particle method to explore a new policy, and stores the new policy with belief in the belief-policy library, so it can be reused in the future. The experimental results show that the proposed approach is an effective way for improving the learning efficiency in large-scale partially observable Markov decision processes.

引用

页码：549 / 552

页数：4

共 50 条

[1] A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
Ross, Stephane
Pineau, Joelle
Chaib-draa, Brahim
Kreitmann, Pierre
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 1729 - 1770
[2] Active learning in partially observable Markov decision processes
Jaulmes, R
Pineau, J
Precup, D
[J]. MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
[3] Learning deterministic policies in partially observable Markov decision processes
Miyazaki, K
Kobayashi, S
[J]. INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 250 - 257
[4] Learning factored representations for partially observable Markov decision processes
Sallans, B
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1050 - 1056
[5] Partially Observable Markov Decision Processes and Robotics
Kurniawati, Hanna
[J]. ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 5 : 253 - 277
[6] A tutorial on partially observable Markov decision processes
Littman, Michael L.
[J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2009, 53 (03) : 119 - 125
[7] Quantum partially observable Markov decision processes
Barry, Jennifer
Barry, Daniel T.
Aaronson, Scott
[J]. PHYSICAL REVIEW A, 2014, 90 (03):
[8] Recursive learning automata for control of partially observable Markov decision processes
Chang, Hyeong Soo
Fu, Michael C.
Marcus, Steven I.
[J]. 2005 44TH IEEE CONFERENCE ON DECISION AND CONTROL & EUROPEAN CONTROL CONFERENCE, VOLS 1-8, 2005, : 6091 - 6096
[9] Planning treatment of ischemic heart disease with partially observable Markov decision processes
Hauskrecht, M
Fraser, H
[J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2000, 18 (03) : 221 - 244
[10] PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES WITH PARTIALLY OBSERVABLE RANDOM DISCOUNT FACTORS
Martinez-Garcia, E. Everardo
Minjarez-Sosa, J. Adolfo
Vega-Amaya, Oscar
[J]. KYBERNETIKA, 2022, 58 (06) : 960 - 983

← 1 2 3 4 5 →