Mixed reinforcement learning for partially observable Markov decision process

被引:0
|
作者
Dung, Le Tien [1 ]
Komeda, Takashi [2 ]
Takagi, Motoki [1 ]
机构
[1] Shibaura Inst Technol, Grad Sch Engn, Tokyo, Japan
[2] Shibaura Inst Technol, Fac Syst Engn, Tokyo, Japan
来源
2007 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION | 2007年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to find an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.
引用
收藏
页码:436 / +
页数:2
相关论文
共 50 条
  • [41] A tutorial on partially observable Markov decision processes
    Littman, Michael L.
    JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2009, 53 (03) : 119 - 125
  • [42] Partially Observable Markov Decision Processes and Robotics
    Kurniawati, Hanna
    ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 5 : 253 - 277
  • [43] Quantum partially observable Markov decision processes
    Barry, Jennifer
    Barry, Daniel T.
    Aaronson, Scott
    PHYSICAL REVIEW A, 2014, 90 (03):
  • [44] Reinforcement Learning to Rank with Markov Decision Process
    Wei, Zeng
    Xu, Jun
    Lan, Yanyan
    Guo, Jiafeng
    Cheng, Xueqi
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 945 - 948
  • [45] Underwater chemical plume tracing based on partially observable Markov decision process
    Jiu Hai-Feng
    Chen Yu
    Deng Wei
    Pang Shuo
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2019, 16 (02):
  • [46] ON THE ADAPTIVE-CONTROL OF A PARTIALLY OBSERVABLE BINARY MARKOV DECISION-PROCESS
    FERNANDEZGAUCHERAND, E
    ARAPOSTATHIS, A
    MARCUS, SI
    LECTURE NOTES IN CONTROL AND INFORMATION SCIENCES, 1989, 130 : 217 - 229
  • [47] ON THE ADAPTIVE-CONTROL OF A PARTIALLY OBSERVABLE BINARY MARKOV DECISION-PROCESS
    FERNANDEZGAUCHERAND, E
    ARAPOSTATHIS, A
    MARCUS, SI
    ADVANCES IN COMPUTING AND CONTROL, 1989, 130 : 217 - 229
  • [48] Partially observable Markov decision process to generate policies in software defect management
    Akbarinasaji, Shirin
    Kavaklioglu, Can
    Basar, Ayse
    Neal, Adam
    JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 163
  • [49] A Partially Observable Markov Decision Process Approach to Residential Home Energy Management
    Hansen, Timothy M.
    Chong, Edwin K. P.
    Suryanarayanan, Siddharth
    Maciejewski, Anthony A.
    Siegel, Howard Jay
    IEEE TRANSACTIONS ON SMART GRID, 2018, 9 (02) : 1271 - 1281
  • [50] Partially Observable Markov Decision Process for Closed-Loop Anesthesia Control
    Borera, Eddy C.
    Moore, Brett L.
    Pyeatt, Larry D.
    20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 : 949 - +