Upper Confident Bound Fuzzy Q-learning and Its Application to a Video Game

被引：0

作者：

Morita, Takahiro ^{[1
]}

Hosobe, Hiroshi ^{[2
]}

机构：

[1] Hosei Univ, Grad Sch Comp & Informat Sci, Tokyo, Japan

[2] Hosei Univ, Fac Comp & Informat Sci, Tokyo, Japan

来源：

ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 3 | 2022年

关键词：

Machine Learning; Fuzzy Q-learning; UCB Algorithm; Video Game;

D O I：

10.5220/0010835700003116

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes upper confident bound (UCB) fuzzy Q-learning by combining fuzzy Q-learning and the UCBQ algorithm and applies it to a video game. The UCBQ algorithm improved the action selection method called the UCB algorithm by applying it to Q-learning. The UCB algorithm selects the action with the highest UCB value instead of a value estimate. Since the UCB algorithm is based on the premise that any unselected actions are selected and value estimates are obtained, the number of unselected actions becomes small, and it is able to prevent local optimal solutions. The proposed method aims to promote the efficiency of learning by reducing unselected actions and preventing the Q value from becoming a local optimal solution in fuzzy Q-learning. This paper applies the proposed method to a video game called Ms. PacMan and presents the result of an experiment on finding optimum values in the method. Its evaluation is conducted by comparing the game scores with the scores obtained by a previous fuzzy Q-learning method. The result shows that the proposed method significantly reduced unselected actions.

引用

页码：454 / 461

页数：8

共 50 条

[31] Modeling and fuzzy Q-learning control of biped walking
Meng Joo Er
Yi Zhou
Proceedings of the 24th Chinese Control Conference, Vols 1 and 2, 2005, : 641 - 646
[32] Accuracy based fuzzy Q-learning for robot behaviours
Gu, DB
Hu, HS
2004 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, PROCEEDINGS, 2004, : 1455 - 1460
[33] A Hybrid Fuzzy Q-Learning algorithm for robot navigation
Gordon, Sean W.
Reyes, Napoleon H.
Barczak, Andre
2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 2625 - 2631
[34] Intelligent Fuzzy Q-Learning control of humanoid robots
Er, MJ
Zhou, Y
ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 3, PROCEEDINGS, 2005, 3498 : 216 - 221
[35] Anomaly Detection using Fuzzy Q-learning Algorithm
Shamshirband, Shahaboddin
Anuar, Nor Badrul
Kiah, Miss Laiha Mat
Misra, Sanjay
ACTA POLYTECHNICA HUNGARICA, 2014, 11 (08) : 5 - 28
[36] Automatic generation of fuzzy inference systems by dynamic fuzzy Q-Learning
Deng, C
Er, MJ
2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 3206 - 3211
[37] Routing in VANETs: A Fuzzy Constraint Q-Learning Approach
Wu, Celimuge
Ohzahata, Satoshi
Kato, Toshihiko
2012 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2012, : 195 - 200
[38] Fuzzy Q-learning in continuous state and action space
Xu M.-L.
Xu W.-B.
Journal of China Universities of Posts and Telecommunications, 2010, 17 (04): : 100 - 109
[39] Regenerative braking system modeling by fuzzy Q-Learning
Maia, Ricardo
Mendes, Jerome
Araujo, Rui
Silva, Marco
Nunes, Urbano
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 93
[40] Fuzzy Q-learning in continuous state and action space
XU Ming-liang1
TheJournalofChinaUniversitiesofPostsandTelecommunications, 2010, 17 (04) : 100 - 109

← 1 2 3 4 5 →