Double action Q-learning for obstacle avoidance in a dynamically changing environment

被引:0
|
作者
Ngai, DCK [1 ]
Yung, NHC [1 ]
机构
[1] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Hong Kong, Peoples R China
关键词
Q-learning; reinforcement learning; temporal differences; obstacle avoidance;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a new method for solving the reinforcement learning problem in a dynamically changing environment, as in vehicle navigation, in which the Markov Decision Process used in traditional reinforcement learning is modified so that the response of the environment is taken into consideration for determining the agent's next state. This is achieved by changing the action-value function to handle three parameters at a time, namely, the current state, action taken by the agent, and action taken by the environment. As it considers the actions by the agent and environment, it is termed "Double Action". Based on the Q-learning method, the proposed method is implemented and the update rule is modified to handle all of the three parameters. Preliminary results show that the proposed method has the sum of rewards (negative) 89.5% less than that of the traditional method. Apart form that, our new method also has the total number of collisions and mean steps used in one episode 89.5% and 15.5% lower than that of the traditional method respectively.
引用
收藏
页码:211 / 216
页数:6
相关论文
共 50 条
  • [31] Q-learning Approach in the Context of Virtual Learning Environment
    Liviu, Ionita
    Irina, Tudor
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON VIRTUAL LEARNING, 2008, : 209 - 214
  • [32] Investigation of Q-Learning in the Context of a Virtual Learning Environment
    Baziukaite, Dalia
    INFORMATICS IN EDUCATION, 2007, 6 (02): : 255 - 268
  • [33] Q-learning in continuous state and action spaces
    Gaskett, C
    Wettergreen, D
    Zelinsky, A
    ADVANCED TOPICS IN ARTIFICIAL INTELLIGENCE, 1999, 1747 : 417 - 428
  • [34] Q-Learning with probability based action policy
    Ugurlu, Ekin Su
    Biricik, Goksel
    2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 210 - +
  • [35] Modification of Q-learning to Adapt to the Randomness of Environment
    Luo, Xiulian
    Gao, Youbing
    Huang, Shao
    Zhao, Yaodong
    Zhang, Shengmiao
    ICCAIS 2019: THE 8TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES, 2019,
  • [36] Q-learning with Experience Replay in a Dynamic Environment
    Pieters, Mathijs
    Wiering, Marco A.
    PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
  • [37] Variational quantum compiling with double Q-learning
    He, Zhimin
    Li, Lvzhou
    Zheng, Shenggen
    Li, Yongyao
    Situ, Haozhen
    NEW JOURNAL OF PHYSICS, 2021, 23 (03):
  • [38] The Q-learning obstacle avoidance algorithm based on EKF-SLAM for NAO autonomous walking under unknown environments
    Wen, Shuhuan
    Chen, Xiao
    Ma, Chunli
    Lam, H. K.
    Hua, Shaoyang
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2015, 72 : 29 - 36
  • [39] Double Q-Learning for Radiation Source Detection
    Liu, Zheng
    Abbaszadeh, Shiva
    SENSORS, 2019, 19 (04)
  • [40] Fast-maneuvering target seeking based on double-action Q-learning
    Ngai, Daniel C. K.
    Yung, Nelson H. C.
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2007, 4571 : 653 - +