Reinforcement learning algorithm for partially observable Markov decision processes

被引：0

作者：

Wang, Xue-Ning ^{[1
]}

He, Han-Gen ^{[1
]}

Xu, Xin ^{[1
]}

机构：

[1] Inst. of Automat., Natl. Univ. of Defence Technol., Changsha 410073, China

来源：

Kongzhi yu Juece/Control and Decision | 2004年 / 19卷 / 11期

关键词：

Convergence of numerical methods - Decision theory - Markov processes - Optimization - State space methods;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In partially observable markov decision processes (POMDP), due to perceptual aliasing, the memoryless policies obtained by Sarsa-learning may oscillate. A memory-based new reinforcement learning algorithm-CpnSarsa (A) is studied to solve this problem. With new definitions of states, the agent combines current observation with preobservations to distinguish aliasing states. With application of the algorithm to some typical POMDP, the optimal or almost-optimal policies are obtained. Comparing with previous algorithms, this algorithm greatly improves the convergence rate.

引用

页码：1263 / 1266

共 50 条

[1] A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes
Le, Tuyen P.
Ngo Anh Vien
Chung, Taechoong
[J]. IEEE ACCESS, 2018, 6 : 49089 - 49102
[2] A pulse neural network reinforcement learning algorithm for partially observable Markov decision processes
Takita, Koichiro
Hagiwara, Masafumi
[J]. Systems and Computers in Japan, 2005, 36 (03): : 42 - 52
[3] Fuzzy Reinforcement Learning Control for Decentralized Partially Observable Markov Decision Processes
Sharma, Rajneesh
Spaan, Matthijs T. J.
[J]. IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 1422 - 1429
[4] Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes
Guo, Hongyi
Cai, Qi
Zhang, Yufeng
Yang, Zhuoran
Wang, Zhaoran
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[5] Active learning in partially observable Markov decision processes
Jaulmes, R
Pineau, J
Precup, D
[J]. MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
[6] Mixed reinforcement learning for partially observable Markov decision process
Dung, Le Tien
Komeda, Takashi
Takagi, Motoki
[J]. 2007 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION, 2007, : 436 - +
[7] CHQ: A multi-agent reinforcement learning scheme for partially observable Markov decision processes
Osada, H
Fujita, S
[J]. IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2004, : 17 - 23
[8] CHQ: A multi-agent reinforcement learning scheme for partially observable Markov decision processes
Osada, H
Fujita, S
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (05): : 1004 - 1011
[9] Learning deterministic policies in partially observable Markov decision processes
Miyazaki, K
Kobayashi, S
[J]. INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 250 - 257
[10] Learning factored representations for partially observable Markov decision processes
Sallans, B
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1050 - 1056

← 1 2 3 4 5 →