Mixed reinforcement learning for partially observable Markov decision process

被引:0
|
作者
Dung, Le Tien [1 ]
Komeda, Takashi [2 ]
Takagi, Motoki [1 ]
机构
[1] Shibaura Inst Technol, Grad Sch Engn, Tokyo, Japan
[2] Shibaura Inst Technol, Fac Syst Engn, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to find an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.
引用
收藏
页码:436 / +
页数:2
相关论文
共 50 条
  • [1] Reinforcement learning algorithm for partially observable Markov decision processes
    Wang, Xue-Ning
    He, Han-Gen
    Xu, Xin
    Kongzhi yu Juece/Control and Decision, 2004, 19 (11): : 1263 - 1266
  • [2] Fuzzy Reinforcement Learning Control for Decentralized Partially Observable Markov Decision Processes
    Sharma, Rajneesh
    Spaan, Matthijs T. J.
    IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 1422 - 1429
  • [3] A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes
    Le, Tuyen P.
    Ngo Anh Vien
    Chung, Taechoong
    IEEE ACCESS, 2018, 6 : 49089 - 49102
  • [4] Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes
    Guo, Hongyi
    Cai, Qi
    Zhang, Yufeng
    Yang, Zhuoran
    Wang, Zhaoran
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] Robust partially observable Markov decision process
    Osogami, Takayuki
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 106 - 115
  • [6] A pulse neural network reinforcement learning algorithm for partially observable Markov decision processes
    Takita, Koichiro
    Hagiwara, Masafumi
    Systems and Computers in Japan, 2005, 36 (03): : 42 - 52
  • [7] Learning hierarchical partially observable Markov decision process models for robot navigation
    Theocharous, G
    Rohanimanesh, K
    Mahadevan, S
    2001 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2001, : 511 - 516
  • [8] Autonomous Thermalling as a Partially Observable Markov Decision Process
    Guilliard, Iain
    Rogahn, Richard J.
    Piavis, Jim
    Kolobov, Andrey
    ROBOTICS: SCIENCE AND SYSTEMS XIV, 2018,
  • [9] Active learning in partially observable Markov decision processes
    Jaulmes, R
    Pineau, J
    Precup, D
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
  • [10] CHQ: A multi-agent reinforcement learning scheme for partially observable Markov decision processes
    Osada, H
    Fujita, S
    IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2004, : 17 - 23