Continuous-action Q-learning

被引:73
|
作者
Millán, JDR [1 ]
Posenato, D [1 ]
Dedieu, E [1 ]
机构
[1] European Commiss, Joint Res Ctr, I-21020 Ispra, VA, Italy
关键词
reinforcement learning; incremental topology preserving maps; continuous domains; real-time operation;
D O I
10.1023/A:1017988514716
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a Q-learning method that works in continuous domains. Other characteristics of our approach are the use of an incremental topology preserving map (ITPM) to partition the input space, and the incorporation of bias to initialize the learning process. A unit of the ITPM represents a limited region of the input space and maps it onto the Q-values of M possible discrete actions. The resulting continuous action is an average of the discrete actions of the "winning unit" weighted by their Q-values. Then, TD(lambda) updates the Q-values of the discrete actions according to their contribution. Units are created incrementally and their associated Q-values are initialized by means of domain knowledge. Experimental results in robotics domains show the superiority of the proposed continuous-action Q-learning over the standard discrete-action version in terms of both asymptotic performance and speed of learning. The paper also reports a comparison of discounted-reward against average-reward Q-learning in an infinite horizon robotics task.
引用
收藏
页码:247 / 265
页数:19
相关论文
共 50 条
  • [11] Continuous-Action Reinforcement Learning for Memory Allocation in Virtualized Servers
    Garrido, Luis A.
    Nishtala, Rajiv
    Carpenter, Paul
    [J]. HIGH PERFORMANCE COMPUTING: ISC HIGH PERFORMANCE 2019 INTERNATIONAL WORKSHOPS, 2020, 11887 : 13 - 24
  • [12] AUTOMATED CONTINUOUS-ACTION DESORBER
    PROTODYA.IO
    [J]. ZHURNAL PRIKLADNOI KHIMII, 1973, 46 (07) : 1614 - 1615
  • [13] Development of Continuous-Action Filters
    S. Yu. Panov
    M. K. Al-Kudakh
    E. V. Arkhangel'skaya
    Yu. V. Krasovitskii
    V. A. Goremykin
    [J]. Chemical and Petroleum Engineering, 2000, 36 : 760 - 764
  • [14] Continuous-Action Multiplier Engineering
    Yu. M. Lekontsev
    P. V. Sazhin
    B. L. Gerike
    A. V. Novik
    Yu. B. Mezentsev
    [J]. Journal of Mining Science, 2023, 59 : 604 - 610
  • [15] CONTINUOUS-ACTION BATCHER OF LIQUIDS
    VOSKANYAN, RA
    [J]. INDUSTRIAL LABORATORY, 1978, 44 (12): : 1717 - 1718
  • [16] Continuous-Action Reinforcement Learning for Portfolio Allocation of a Life Insurance Company
    Abrate, Carlo
    Angius, Alessio
    Morales, Gianmarco De Francisci
    Cozzini, Stefano
    Iadanza, Francesca
    Li Puma, Laura
    Pavanelli, Simone
    Perotti, Alan
    Pignataro, Stefano
    Ronchiadin, Silvia
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: APPLIED DATA SCIENCE TRACK, PT IV, 2021, 12978 : 237 - 252
  • [17] Human-in-the-Loop Reinforcement Learning in Continuous-Action Space
    Luo, Biao
    Wu, Zhengke
    Zhou, Fei
    Wang, Bing-Chuan
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (11) : 1 - 10
  • [18] Continuous-Action Multiplier Engineering
    Lekontsev, Yu. M.
    Sazhin, P. V.
    Gerike, B. L.
    Novik, A. V.
    Mezentsev, Yu. B.
    [J]. JOURNAL OF MINING SCIENCE, 2023, 59 (04) : 604 - 610
  • [19] Development of continuous-action filters
    Panov, SY
    Al-Kudakh, MK
    Arkhangel'skaya, EV
    Krasovitskii, YV
    Goremykin, VA
    [J]. CHEMICAL AND PETROLEUM ENGINEERING, 2000, 36 (11-12) : 760 - 764
  • [20] Reinforcement distribution in continuous state action space fuzzy Q-learning: A novel approach
    Bonarini, A
    Montrone, F
    Restelli, M
    [J]. FUZZY LOGIC AND APPLICATIONS, 2006, 3849 : 40 - 45