A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

被引：0

作者：

Lei Zheng

Siu-Yeung Cho

机构：

[1] Nanyang Technological University,School of Computer Engineering

来源：

Neural Processing Letters | 2011年 / 33卷

关键词：

Memory-based reinforcement learning; Markov decision processes; Partially observable Markov decision processes; Reinforcement learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree’s state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems.

引用

下载

页码：187 / 200

页数：13

共 50 条

[31] An Efficient Method for Solving Routing Problems with Energy Constraints Using Reinforcement Learning
Do, Haggi
Son, Hakmo
Kim, Jinwhan
2024 21ST INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS, UR 2024, 2024, : 293 - 298
[32] MDRL-IR: Incentive Routing for Blockchain Scalability With Memory-Based Deep Reinforcement Learning
Tang, Bingxin
Liang, Junyuan
Cai, Zhongteng
Cai, Ting
Zhou, Xiaocong
Chen, Yingye
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (06) : 4375 - 4388
[33] Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control
Choi, SPM
Yeung, DY
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 945 - 951
[34] Memory-based deep reinforcement learning for cognitive radar target tracking waveform resource management
Qin, Jiahao
Zhu, Mengtao
Pan, Zesi
Li, Yunjie
Li, Yan
IET RADAR SONAR AND NAVIGATION, 2023, 17 (12): : 1822 - 1836
[35] Modelling personalised car-following behaviour: a memory-based deep reinforcement learning approach
Liao, Yaping
Yu, Guizhen
Chen, Peng
Zhou, Bin
Li, Han
TRANSPORTMETRICA A-TRANSPORT SCIENCE, 2024, 20 (01) : 36 - 36
[36] Multi-Agent Active Perception Based on Reinforcement Learning and POMDP
Selimovic, Tarik
Peti, Marijana
Bogdan, Stjepan
IEEE ACCESS, 2024, 12 : 48004 - 48016
[37] A novel dynamic spectrum allocation algorithm based on POMDP reinforcement learning
Tang, Lun
Chen, Qian-Bin
Zeng, Xiao-Ping
Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2009, 32 (06): : 125 - 129
[38] Memory-based in situ learning for unmanned vehicles
McDowell, Patrick
Bourgeois, Brian S.
Sofge, Donald A.
Iyengar, S. S.
COMPUTER, 2006, 39 (12) : 62 - +
[39] WATER DEMAND FORECASTING BY MEMORY-BASED LEARNING
TAMADA, T
MARUYAMA, M
NAKAMURA, Y
ABE, S
MAEDA, K
WATER SCIENCE AND TECHNOLOGY, 1993, 28 (11-12) : 133 - 140
[40] Memory-based neural networks for robot learning
Atkeson, CG
Schaal, S
NEUROCOMPUTING, 1995, 9 (03) : 243 - 269

← 1 2 3 4 5 →