Q-Learning with probability based action policy

被引：0

作者：

Ugurlu, Ekin Su ^{[1
]}

Biricik, Goksel ^{[1
]}

机构：

[1] Yildiz Tekn Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey

来源：

2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2 | 2006年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In Q-learning, the aim is to reach the goal by using state and action pairs. When the goal is set as a big reward, the optimal path is found as soon as the reward accumulated reaches its highest value. Upon modification of the start and goal points, the information concerning how to reach the goal becomes useless even if the environment does not change. In this study, Q-learning is improved by making the usage of the past data possible. To achieve this, action probabilities for certain start and goal points are found and a neural network is trained with those values to estimate the action probabilities for other start and goal points. A radial basis function network is used as neural network for it can support local representation and can learn fast when there is a few number of inputs. When Q-learning is run with the found action probabilities, an increase in speed is observed in reaching the goal.

引用

页码：210 / +

页数：2

共 50 条

[1] Cooperative Q-Learning Based on Maturity of the Policy
Yang, Mao
Tian, Yantao
Liu, Xiaomei
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, VOLS 1-7, CONFERENCE PROCEEDINGS, 2009, : 1352 - 1356
[2] Glyph-Based Visual Analysis of Q-Learning Based Action Policy Ensembles on Racetrack
Gross, D.
Klauck, M.
Gros, T. P.
Steinmetz, M.
Hoffmann, J.
Gumhold, S.
[J]. 2022 26TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV), 2022, : 1 - 10
[3] Continuous-action Q-learning
Millán, JDR
Posenato, D
Dedieu, E
[J]. MACHINE LEARNING, 2002, 49 (2-3) : 247 - 265
[4] Continuous-Action Q-Learning
José del R. Millán
Daniele Posenato
Eric Dedieu
[J]. Machine Learning, 2002, 49 : 247 - 265
[5] Greedy exploration policy of Q-learning based on state balance
Zheng, Yu
Luo, Siwei
Zhang, Jing
[J]. TENCON 2005 - 2005 IEEE REGION 10 CONFERENCE, VOLS 1-5, 2006, : 2556 - +
[6] Combining Q-learning and Deterministic Policy Gradient for Learning-based MPC
Seel, Katrine
Gros, Ebastien
Gravdahl, Jan Tommy
[J]. 2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 610 - 617
[7] Q-learning based on neural network in learning action selection of mobile robot
Qiao, Junfei
Hou, Zhanjun
Ruan, Xiaogang
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS, VOLS 1-6, 2007, : 263 - 267
[8] Q-learning in continuous state and action spaces
Gaskett, C
Wettergreen, D
Zelinsky, A
[J]. ADVANCED TOPICS IN ARTIFICIAL INTELLIGENCE, 1999, 1747 : 417 - 428
[9] Performance Investigation of UCB Policy in Q-Learning
Saito, Koki
Notsu, Akira
Ubukata, Seiki
Honda, Katsuhiro
[J]. 2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 777 - 780
[10] Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action Tasks
Jiang, Haobo
Xie, Jin
Yang, Jian
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7979 - 7986

← 1 2 3 4 5 →