Q-Learning with probability based action policy

被引:0
|
作者
Ugurlu, Ekin Su [1 ]
Biricik, Goksel [1 ]
机构
[1] Yildiz Tekn Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Q-learning, the aim is to reach the goal by using state and action pairs. When the goal is set as a big reward, the optimal path is found as soon as the reward accumulated reaches its highest value. Upon modification of the start and goal points, the information concerning how to reach the goal becomes useless even if the environment does not change. In this study, Q-learning is improved by making the usage of the past data possible. To achieve this, action probabilities for certain start and goal points are found and a neural network is trained with those values to estimate the action probabilities for other start and goal points. A radial basis function network is used as neural network for it can support local representation and can learn fast when there is a few number of inputs. When Q-learning is run with the found action probabilities, an increase in speed is observed in reaching the goal.
引用
收藏
页码:210 / +
页数:2
相关论文
共 50 条
  • [1] Cooperative Q-Learning Based on Maturity of the Policy
    Yang, Mao
    Tian, Yantao
    Liu, Xiaomei
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, VOLS 1-7, CONFERENCE PROCEEDINGS, 2009, : 1352 - 1356
  • [2] Glyph-Based Visual Analysis of Q-Learning Based Action Policy Ensembles on Racetrack
    Gross, D.
    Klauck, M.
    Gros, T. P.
    Steinmetz, M.
    Hoffmann, J.
    Gumhold, S.
    [J]. 2022 26TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV), 2022, : 1 - 10
  • [3] Continuous-action Q-learning
    Millán, JDR
    Posenato, D
    Dedieu, E
    [J]. MACHINE LEARNING, 2002, 49 (2-3) : 247 - 265
  • [4] Continuous-Action Q-Learning
    José del R. Millán
    Daniele Posenato
    Eric Dedieu
    [J]. Machine Learning, 2002, 49 : 247 - 265
  • [5] Greedy exploration policy of Q-learning based on state balance
    Zheng, Yu
    Luo, Siwei
    Zhang, Jing
    [J]. TENCON 2005 - 2005 IEEE REGION 10 CONFERENCE, VOLS 1-5, 2006, : 2556 - +
  • [6] Combining Q-learning and Deterministic Policy Gradient for Learning-based MPC
    Seel, Katrine
    Gros, Ebastien
    Gravdahl, Jan Tommy
    [J]. 2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 610 - 617
  • [7] Q-learning based on neural network in learning action selection of mobile robot
    Qiao, Junfei
    Hou, Zhanjun
    Ruan, Xiaogang
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS, VOLS 1-6, 2007, : 263 - 267
  • [8] Q-learning in continuous state and action spaces
    Gaskett, C
    Wettergreen, D
    Zelinsky, A
    [J]. ADVANCED TOPICS IN ARTIFICIAL INTELLIGENCE, 1999, 1747 : 417 - 428
  • [9] Performance Investigation of UCB Policy in Q-Learning
    Saito, Koki
    Notsu, Akira
    Ubukata, Seiki
    Honda, Katsuhiro
    [J]. 2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 777 - 780
  • [10] Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action Tasks
    Jiang, Haobo
    Xie, Jin
    Yang, Jian
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7979 - 7986