A temporal-difference learning method using gaussian state representation for continuous state space problems

被引:0
|
作者
机构
[1] Fujii, Natsuko
[2] Ueno, Atsushi
[3] Takubo, Tomohito
来源
| 1600年 / Japanese Society for Artificial Intelligence卷 / 29期
关键词
Learning algorithms - Gaussian distribution;
D O I
10.1527/tjsai.29.157
中图分类号
学科分类号
摘要
In this paper, we tackle the problem of reinforcement learning (RL) in a continuous state space. An appropriate discretization of the space can make many learning tasks tractable. A method using Gaussian state representation and the Rational Policy Making algorithm (RPM) has been proposed for this problem. This method discretizes the space by constructing a chain of states which represents a path to the goal of the agent exploiting past experiences of reaching it. This method exploits successful experiences strongly. Therefore, it can find a rational solution quickly in an environment with few noises. In a noisy environment, it makes many unnecessary and distractive states and does the task poorly. For learning in such an environment, we have introduced the concept of the value of a state to the above method and developed a new method. This method uses a temporal-difference (TD) learning algorithm for learning the values of states. The value of a state is used to determine the size of the state. Thus, our developed method can trim and eliminate unnecessary and distractive states quickly and learn the task well even in a noisy environment. We show the effectiveness of our method by computer simulations of a path finding task and a cart-pole swing-up task. © The Japanese Society for Artificial Intelligence 2014.
引用
收藏
相关论文
共 50 条
  • [1] Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach
    Jia, Yanwei
    Zhou, Xun Yu
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [2] Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates
    Penedones, Hugo
    Riquelme, Carlos
    Vincent, Damien
    Maennel, Hartmut
    Mann, Timothy
    Barreto, Andre
    Gelly, Sylvain
    Neu, Gergely
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Fuzzy interpretation for temporal-difference learning in anomaly detection problems
    Sukhanov, A. V.
    Kovalev, S. M.
    Styskala, V.
    BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2016, 64 (03) : 625 - 632
  • [4] Fuzzy interpretation for temporal-difference learning in anomaly detection problems
    Sukhanov A.V.
    Kovalev S.M.
    Stýskala V.
    Sukhanov, A.V. (drewnia@rambler.ru), 1600, Polska Akademia Nauk (64): : 625 - 632
  • [5] Using temporal-difference learning for multi-agent bargaining
    Huang, Shiu-li
    Lin, Fu-ren
    ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2008, 7 (04) : 432 - 442
  • [6] GAUSSIAN PROCESS TEMPORAL-DIFFERENCE LEARNING WITH SCALABILITY AND WORST-CASE PERFORMANCE GUARANTEES
    Lu, Qin
    Giannakis, Georgios B.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3485 - 3489
  • [7] Striatal and Tegmental Neurons Code Critical Signals for Temporal-Difference Learning of State Value in Domestic Chicks
    Wen, Chentao
    Ogura, Yukiko
    Matsushima, Toshiya
    FRONTIERS IN NEUROSCIENCE, 2016, 10
  • [8] On the Convergence of Reinforcement Learning in Nonlinear Continuous State Space Problems
    Goyal, Raman
    Chakravorty, Suman
    Wang, Ran
    Mohamed, Mohamed Naveed Gul
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 2969 - 2975
  • [9] Temporal difference learning in continuous time and space
    Doya, K
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 1073 - 1079
  • [10] Intentionally-underestimated value function at terminal state for temporal-difference learning with mis-designed reward
    Kobayashi, Taisuke
    RESULTS IN CONTROL AND OPTIMIZATION, 2025, 18