Double action Q-learning for obstacle avoidance in a dynamically changing environment

被引：0

作者：

Ngai, DCK ^{[1
]}

Yung, NHC ^{[1
]}

机构：

[1] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Hong Kong, Peoples R China

来源：

2005 IEEE Intelligent Vehicles Symposium Proceedings | 2005年

关键词：

Q-learning; reinforcement learning; temporal differences; obstacle avoidance;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a new method for solving the reinforcement learning problem in a dynamically changing environment, as in vehicle navigation, in which the Markov Decision Process used in traditional reinforcement learning is modified so that the response of the environment is taken into consideration for determining the agent's next state. This is achieved by changing the action-value function to handle three parameters at a time, namely, the current state, action taken by the agent, and action taken by the environment. As it considers the actions by the agent and environment, it is termed "Double Action". Based on the Q-learning method, the proposed method is implemented and the update rule is modified to handle all of the three parameters. Preliminary results show that the proposed method has the sum of rewards (negative) 89.5% less than that of the traditional method. Apart form that, our new method also has the total number of collisions and mean steps used in one episode 89.5% and 15.5% lower than that of the traditional method respectively.

引用

页码：211 / 216

页数：6

共 50 条

[31] Q-learning Approach in the Context of Virtual Learning Environment
Liviu, Ionita
Irina, Tudor
PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON VIRTUAL LEARNING, 2008, : 209 - 214
[32] Investigation of Q-Learning in the Context of a Virtual Learning Environment
Baziukaite, Dalia
INFORMATICS IN EDUCATION, 2007, 6 (02): : 255 - 268
[33] Q-learning in continuous state and action spaces
Gaskett, C
Wettergreen, D
Zelinsky, A
ADVANCED TOPICS IN ARTIFICIAL INTELLIGENCE, 1999, 1747 : 417 - 428
[34] Q-Learning with probability based action policy
Ugurlu, Ekin Su
Biricik, Goksel
2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 210 - +
[35] Modification of Q-learning to Adapt to the Randomness of Environment
Luo, Xiulian
Gao, Youbing
Huang, Shao
Zhao, Yaodong
Zhang, Shengmiao
ICCAIS 2019: THE 8TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES, 2019,
[36] Q-learning with Experience Replay in a Dynamic Environment
Pieters, Mathijs
Wiering, Marco A.
PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
[37] Variational quantum compiling with double Q-learning
He, Zhimin
Li, Lvzhou
Zheng, Shenggen
Li, Yongyao
Situ, Haozhen
NEW JOURNAL OF PHYSICS, 2021, 23 (03):
[38] The Q-learning obstacle avoidance algorithm based on EKF-SLAM for NAO autonomous walking under unknown environments
Wen, Shuhuan
Chen, Xiao
Ma, Chunli
Lam, H. K.
Hua, Shaoyang
ROBOTICS AND AUTONOMOUS SYSTEMS, 2015, 72 : 29 - 36
[39] Double Q-Learning for Radiation Source Detection
Liu, Zheng
Abbaszadeh, Shiva
SENSORS, 2019, 19 (04)
[40] Fast-maneuvering target seeking based on double-action Q-learning
Ngai, Daniel C. K.
Yung, Nelson H. C.
MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2007, 4571 : 653 - +

← 1 2 3 4 5 →