Continuous-action Q-learning

被引：73

作者：

Millán, JDR ^{[1
]}

Posenato, D ^{[1
]}

Dedieu, E ^{[1
]}

机构：

[1] European Commiss, Joint Res Ctr, I-21020 Ispra, VA, Italy

来源：

MACHINE LEARNING | 2002年 / 49卷 / 2-3期

关键词：

reinforcement learning; incremental topology preserving maps; continuous domains; real-time operation;

D O I：

10.1023/A:1017988514716

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a Q-learning method that works in continuous domains. Other characteristics of our approach are the use of an incremental topology preserving map (ITPM) to partition the input space, and the incorporation of bias to initialize the learning process. A unit of the ITPM represents a limited region of the input space and maps it onto the Q-values of M possible discrete actions. The resulting continuous action is an average of the discrete actions of the "winning unit" weighted by their Q-values. Then, TD(lambda) updates the Q-values of the discrete actions according to their contribution. Units are created incrementally and their associated Q-values are initialized by means of domain knowledge. Experimental results in robotics domains show the superiority of the proposed continuous-action Q-learning over the standard discrete-action version in terms of both asymptotic performance and speed of learning. The paper also reports a comparison of discounted-reward against average-reward Q-learning in an infinite horizon robotics task.

引用

页码：247 / 265

页数：19

共 50 条

[11] Continuous-Action Reinforcement Learning for Memory Allocation in Virtualized Servers
Garrido, Luis A.
Nishtala, Rajiv
Carpenter, Paul
[J]. HIGH PERFORMANCE COMPUTING: ISC HIGH PERFORMANCE 2019 INTERNATIONAL WORKSHOPS, 2020, 11887 : 13 - 24
[12] AUTOMATED CONTINUOUS-ACTION DESORBER
PROTODYA.IO
[J]. ZHURNAL PRIKLADNOI KHIMII, 1973, 46 (07) : 1614 - 1615
[13] Development of Continuous-Action Filters
S. Yu. Panov
M. K. Al-Kudakh
E. V. Arkhangel'skaya
Yu. V. Krasovitskii
V. A. Goremykin
[J]. Chemical and Petroleum Engineering, 2000, 36 : 760 - 764
[14] Continuous-Action Multiplier Engineering
Yu. M. Lekontsev
P. V. Sazhin
B. L. Gerike
A. V. Novik
Yu. B. Mezentsev
[J]. Journal of Mining Science, 2023, 59 : 604 - 610
[15] CONTINUOUS-ACTION BATCHER OF LIQUIDS
VOSKANYAN, RA
[J]. INDUSTRIAL LABORATORY, 1978, 44 (12): : 1717 - 1718
[16] Continuous-Action Reinforcement Learning for Portfolio Allocation of a Life Insurance Company
Abrate, Carlo
Angius, Alessio
Morales, Gianmarco De Francisci
Cozzini, Stefano
Iadanza, Francesca
Li Puma, Laura
Pavanelli, Simone
Perotti, Alan
Pignataro, Stefano
Ronchiadin, Silvia
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: APPLIED DATA SCIENCE TRACK, PT IV, 2021, 12978 : 237 - 252
[17] Human-in-the-Loop Reinforcement Learning in Continuous-Action Space
Luo, Biao
Wu, Zhengke
Zhou, Fei
Wang, Bing-Chuan
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (11) : 1 - 10
[18] Continuous-Action Multiplier Engineering
Lekontsev, Yu. M.
Sazhin, P. V.
Gerike, B. L.
Novik, A. V.
Mezentsev, Yu. B.
[J]. JOURNAL OF MINING SCIENCE, 2023, 59 (04) : 604 - 610
[19] Development of continuous-action filters
Panov, SY
Al-Kudakh, MK
Arkhangel'skaya, EV
Krasovitskii, YV
Goremykin, VA
[J]. CHEMICAL AND PETROLEUM ENGINEERING, 2000, 36 (11-12) : 760 - 764
[20] Reinforcement distribution in continuous state action space fuzzy Q-learning: A novel approach
Bonarini, A
Montrone, F
Restelli, M
[J]. FUZZY LOGIC AND APPLICATIONS, 2006, 3849 : 40 - 45

← 1 2 3 4 5 →