Continuous-action Q-learning

被引:73
|
作者
Millán, JDR [1 ]
Posenato, D [1 ]
Dedieu, E [1 ]
机构
[1] European Commiss, Joint Res Ctr, I-21020 Ispra, VA, Italy
关键词
reinforcement learning; incremental topology preserving maps; continuous domains; real-time operation;
D O I
10.1023/A:1017988514716
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a Q-learning method that works in continuous domains. Other characteristics of our approach are the use of an incremental topology preserving map (ITPM) to partition the input space, and the incorporation of bias to initialize the learning process. A unit of the ITPM represents a limited region of the input space and maps it onto the Q-values of M possible discrete actions. The resulting continuous action is an average of the discrete actions of the "winning unit" weighted by their Q-values. Then, TD(lambda) updates the Q-values of the discrete actions according to their contribution. Units are created incrementally and their associated Q-values are initialized by means of domain knowledge. Experimental results in robotics domains show the superiority of the proposed continuous-action Q-learning over the standard discrete-action version in terms of both asymptotic performance and speed of learning. The paper also reports a comparison of discounted-reward against average-reward Q-learning in an infinite horizon robotics task.
引用
收藏
页码:247 / 265
页数:19
相关论文
共 50 条
  • [1] Continuous-Action Q-Learning
    José del R. Millán
    Daniele Posenato
    Eric Dedieu
    [J]. Machine Learning, 2002, 49 : 247 - 265
  • [2] Q-learning in continuous state and action spaces
    Gaskett, C
    Wettergreen, D
    Zelinsky, A
    [J]. ADVANCED TOPICS IN ARTIFICIAL INTELLIGENCE, 1999, 1747 : 417 - 428
  • [3] Learning Continuous-Action Control Policies
    Pazis, Jason
    Lagoudakis, Michail G.
    [J]. ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 169 - 176
  • [5] q-Learning in Continuous Time
    Jia, Yanwei
    Zhou, Xun Yu
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [6] A CONTINUOUS-ACTION VISCOMETER
    PROVINTE.IV
    GERSHKOV.BM
    [J]. INDUSTRIAL LABORATORY, 1966, 32 (05): : 766 - &
  • [7] CONTINUOUS ACTION GENERATION OF Q-LEARNING IN MULTI-AGENT COOPERATION
    Hwang, Kao-Shing
    Chen, Yu-Jen
    Jiang, Wei-Cheng
    Lin, Tzung-Feng
    [J]. ASIAN JOURNAL OF CONTROL, 2013, 15 (04) : 1011 - 1020
  • [8] Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action Tasks
    Jiang, Haobo
    Xie, Jin
    Yang, Jian
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7979 - 7986
  • [9] Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks
    Jiang, Haobo
    Li, Guangyu
    Xie, Jin
    Yang, Jian
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 5269 - 5279
  • [10] Continuous-Action Multiplier Engineering
    Yu. M. Lekontsev
    P. V. Sazhin
    B. L. Gerike
    A. V. Novik
    Yu. B. Mezentsev
    [J]. Journal of Mining Science, 2023, 59 : 604 - 610