Mixed reinforcement learning for partially observable Markov decision process

被引：0

作者：

Dung, Le Tien ^{[1
]}

Komeda, Takashi ^{[2
]}

Takagi, Motoki ^{[1
]}

机构：

[1] Shibaura Inst Technol, Grad Sch Engn, Tokyo, Japan

[2] Shibaura Inst Technol, Fac Syst Engn, Tokyo, Japan

来源：

2007 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION | 2007年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to find an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

引用

页码：436 / +

页数：2

共 50 条

[41] A tutorial on partially observable Markov decision processes
Littman, Michael L.
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2009, 53 (03) : 119 - 125
[42] Partially Observable Markov Decision Processes and Robotics
Kurniawati, Hanna
ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 5 : 253 - 277
[43] Quantum partially observable Markov decision processes
Barry, Jennifer
Barry, Daniel T.
Aaronson, Scott
PHYSICAL REVIEW A, 2014, 90 (03):
[44] Reinforcement Learning to Rank with Markov Decision Process
Wei, Zeng
Xu, Jun
Lan, Yanyan
Guo, Jiafeng
Cheng, Xueqi
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 945 - 948
[45] Underwater chemical plume tracing based on partially observable Markov decision process
Jiu Hai-Feng
Chen Yu
Deng Wei
Pang Shuo
INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2019, 16 (02):
[46] ON THE ADAPTIVE-CONTROL OF A PARTIALLY OBSERVABLE BINARY MARKOV DECISION-PROCESS
FERNANDEZGAUCHERAND, E
ARAPOSTATHIS, A
MARCUS, SI
LECTURE NOTES IN CONTROL AND INFORMATION SCIENCES, 1989, 130 : 217 - 229
[47] ON THE ADAPTIVE-CONTROL OF A PARTIALLY OBSERVABLE BINARY MARKOV DECISION-PROCESS
FERNANDEZGAUCHERAND, E
ARAPOSTATHIS, A
MARCUS, SI
ADVANCES IN COMPUTING AND CONTROL, 1989, 130 : 217 - 229
[48] Partially observable Markov decision process to generate policies in software defect management
Akbarinasaji, Shirin
Kavaklioglu, Can
Basar, Ayse
Neal, Adam
JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 163
[49] A Partially Observable Markov Decision Process Approach to Residential Home Energy Management
Hansen, Timothy M.
Chong, Edwin K. P.
Suryanarayanan, Siddharth
Maciejewski, Anthony A.
Siegel, Howard Jay
IEEE TRANSACTIONS ON SMART GRID, 2018, 9 (02) : 1271 - 1281
[50] Partially Observable Markov Decision Process for Closed-Loop Anesthesia Control
Borera, Eddy C.
Moore, Brett L.
Pyeatt, Larry D.
20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 : 949 - +

← 1 2 3 4 5 →