A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

被引:0
|
作者
Lei Zheng
Siu-Yeung Cho
机构
[1] Nanyang Technological University,School of Computer Engineering
来源
Neural Processing Letters | 2011年 / 33卷
关键词
Memory-based reinforcement learning; Markov decision processes; Partially observable Markov decision processes; Reinforcement learning;
D O I
暂无
中图分类号
学科分类号
摘要
Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree’s state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems.
引用
下载
收藏
页码:187 / 200
页数:13
相关论文
共 50 条
  • [31] An Efficient Method for Solving Routing Problems with Energy Constraints Using Reinforcement Learning
    Do, Haggi
    Son, Hakmo
    Kim, Jinwhan
    2024 21ST INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS, UR 2024, 2024, : 293 - 298
  • [32] MDRL-IR: Incentive Routing for Blockchain Scalability With Memory-Based Deep Reinforcement Learning
    Tang, Bingxin
    Liang, Junyuan
    Cai, Zhongteng
    Cai, Ting
    Zhou, Xiaocong
    Chen, Yingye
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (06) : 4375 - 4388
  • [33] Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control
    Choi, SPM
    Yeung, DY
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 945 - 951
  • [34] Memory-based deep reinforcement learning for cognitive radar target tracking waveform resource management
    Qin, Jiahao
    Zhu, Mengtao
    Pan, Zesi
    Li, Yunjie
    Li, Yan
    IET RADAR SONAR AND NAVIGATION, 2023, 17 (12): : 1822 - 1836
  • [35] Modelling personalised car-following behaviour: a memory-based deep reinforcement learning approach
    Liao, Yaping
    Yu, Guizhen
    Chen, Peng
    Zhou, Bin
    Li, Han
    TRANSPORTMETRICA A-TRANSPORT SCIENCE, 2024, 20 (01) : 36 - 36
  • [36] Multi-Agent Active Perception Based on Reinforcement Learning and POMDP
    Selimovic, Tarik
    Peti, Marijana
    Bogdan, Stjepan
    IEEE ACCESS, 2024, 12 : 48004 - 48016
  • [37] A novel dynamic spectrum allocation algorithm based on POMDP reinforcement learning
    Tang, Lun
    Chen, Qian-Bin
    Zeng, Xiao-Ping
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2009, 32 (06): : 125 - 129
  • [38] Memory-based in situ learning for unmanned vehicles
    McDowell, Patrick
    Bourgeois, Brian S.
    Sofge, Donald A.
    Iyengar, S. S.
    COMPUTER, 2006, 39 (12) : 62 - +
  • [39] WATER DEMAND FORECASTING BY MEMORY-BASED LEARNING
    TAMADA, T
    MARUYAMA, M
    NAKAMURA, Y
    ABE, S
    MAEDA, K
    WATER SCIENCE AND TECHNOLOGY, 1993, 28 (11-12) : 133 - 140
  • [40] Memory-based neural networks for robot learning
    Atkeson, CG
    Schaal, S
    NEUROCOMPUTING, 1995, 9 (03) : 243 - 269