Mixed reinforcement learning for partially observable Markov decision process

被引：0

作者：

Dung, Le Tien ^{[1
]}

Komeda, Takashi ^{[2
]}

Takagi, Motoki ^{[1
]}

机构：

[1] Shibaura Inst Technol, Grad Sch Engn, Tokyo, Japan

[2] Shibaura Inst Technol, Fac Syst Engn, Tokyo, Japan

来源：

2007 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION | 2007年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to find an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

引用

页码：436 / +

页数：2

共 50 条

[1] Reinforcement learning algorithm for partially observable Markov decision processes
Wang, Xue-Ning
He, Han-Gen
Xu, Xin
Kongzhi yu Juece/Control and Decision, 2004, 19 (11): : 1263 - 1266
[2] Fuzzy Reinforcement Learning Control for Decentralized Partially Observable Markov Decision Processes
Sharma, Rajneesh
Spaan, Matthijs T. J.
IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 1422 - 1429
[3] A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes
Le, Tuyen P.
Ngo Anh Vien
Chung, Taechoong
IEEE ACCESS, 2018, 6 : 49089 - 49102
[4] Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes
Guo, Hongyi
Cai, Qi
Zhang, Yufeng
Yang, Zhuoran
Wang, Zhaoran
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[5] Robust partially observable Markov decision process
Osogami, Takayuki
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 106 - 115
[6] A pulse neural network reinforcement learning algorithm for partially observable Markov decision processes
Takita, Koichiro
Hagiwara, Masafumi
Systems and Computers in Japan, 2005, 36 (03): : 42 - 52
[7] Learning hierarchical partially observable Markov decision process models for robot navigation
Theocharous, G
Rohanimanesh, K
Mahadevan, S
2001 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2001, : 511 - 516
[8] Autonomous Thermalling as a Partially Observable Markov Decision Process
Guilliard, Iain
Rogahn, Richard J.
Piavis, Jim
Kolobov, Andrey
ROBOTICS: SCIENCE AND SYSTEMS XIV, 2018,
[9] Active learning in partially observable Markov decision processes
Jaulmes, R
Pineau, J
Precup, D
MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
[10] CHQ: A multi-agent reinforcement learning scheme for partially observable Markov decision processes
Osada, H
Fujita, S
IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2004, : 17 - 23

← 1 2 3 4 5 →