Reinforcement learning algorithm for partially observable Markov decision processes

被引:0
|
作者
Wang, Xue-Ning [1 ]
He, Han-Gen [1 ]
Xu, Xin [1 ]
机构
[1] Inst. of Automat., Natl. Univ. of Defence Technol., Changsha 410073, China
来源
Kongzhi yu Juece/Control and Decision | 2004年 / 19卷 / 11期
关键词
Convergence of numerical methods - Decision theory - Markov processes - Optimization - State space methods;
D O I
暂无
中图分类号
学科分类号
摘要
In partially observable markov decision processes (POMDP), due to perceptual aliasing, the memoryless policies obtained by Sarsa-learning may oscillate. A memory-based new reinforcement learning algorithm-CpnSarsa (A) is studied to solve this problem. With new definitions of states, the agent combines current observation with preobservations to distinguish aliasing states. With application of the algorithm to some typical POMDP, the optimal or almost-optimal policies are obtained. Comparing with previous algorithms, this algorithm greatly improves the convergence rate.
引用
收藏
页码:1263 / 1266
相关论文
共 50 条
  • [1] A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes
    Le, Tuyen P.
    Ngo Anh Vien
    Chung, Taechoong
    [J]. IEEE ACCESS, 2018, 6 : 49089 - 49102
  • [2] A pulse neural network reinforcement learning algorithm for partially observable Markov decision processes
    Takita, Koichiro
    Hagiwara, Masafumi
    [J]. Systems and Computers in Japan, 2005, 36 (03): : 42 - 52
  • [3] Fuzzy Reinforcement Learning Control for Decentralized Partially Observable Markov Decision Processes
    Sharma, Rajneesh
    Spaan, Matthijs T. J.
    [J]. IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 1422 - 1429
  • [4] Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes
    Guo, Hongyi
    Cai, Qi
    Zhang, Yufeng
    Yang, Zhuoran
    Wang, Zhaoran
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] Active learning in partially observable Markov decision processes
    Jaulmes, R
    Pineau, J
    Precup, D
    [J]. MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
  • [6] Mixed reinforcement learning for partially observable Markov decision process
    Dung, Le Tien
    Komeda, Takashi
    Takagi, Motoki
    [J]. 2007 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION, 2007, : 436 - +
  • [7] CHQ: A multi-agent reinforcement learning scheme for partially observable Markov decision processes
    Osada, H
    Fujita, S
    [J]. IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2004, : 17 - 23
  • [8] CHQ: A multi-agent reinforcement learning scheme for partially observable Markov decision processes
    Osada, H
    Fujita, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (05): : 1004 - 1011
  • [9] Learning deterministic policies in partially observable Markov decision processes
    Miyazaki, K
    Kobayashi, S
    [J]. INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 250 - 257
  • [10] Learning factored representations for partially observable Markov decision processes
    Sallans, B
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1050 - 1056