Continuous-action Q-learning

被引：73

作者：

Millán, JDR ^{[1
]}

Posenato, D ^{[1
]}

Dedieu, E ^{[1
]}

机构：

[1] European Commiss, Joint Res Ctr, I-21020 Ispra, VA, Italy

来源：

MACHINE LEARNING | 2002年 / 49卷 / 2-3期

关键词：

reinforcement learning; incremental topology preserving maps; continuous domains; real-time operation;

D O I：

10.1023/A:1017988514716

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a Q-learning method that works in continuous domains. Other characteristics of our approach are the use of an incremental topology preserving map (ITPM) to partition the input space, and the incorporation of bias to initialize the learning process. A unit of the ITPM represents a limited region of the input space and maps it onto the Q-values of M possible discrete actions. The resulting continuous action is an average of the discrete actions of the "winning unit" weighted by their Q-values. Then, TD(lambda) updates the Q-values of the discrete actions according to their contribution. Units are created incrementally and their associated Q-values are initialized by means of domain knowledge. Experimental results in robotics domains show the superiority of the proposed continuous-action Q-learning over the standard discrete-action version in terms of both asymptotic performance and speed of learning. The paper also reports a comparison of discounted-reward against average-reward Q-learning in an infinite horizon robotics task.

引用

页码：247 / 265

页数：19

共 50 条

[1] Continuous-Action Q-Learning
José del R. Millán
Daniele Posenato
Eric Dedieu
[J]. Machine Learning, 2002, 49 : 247 - 265
[2] Q-learning in continuous state and action spaces
Gaskett, C
Wettergreen, D
Zelinsky, A
[J]. ADVANCED TOPICS IN ARTIFICIAL INTELLIGENCE, 1999, 1747 : 417 - 428
[3] Learning Continuous-Action Control Policies
Pazis, Jason
Lagoudakis, Michail G.
[J]. ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 169 - 176
[4] Fuzzy Q-learning in continuous state and action space
XU Ming-liang1
[J]. The Journal of China Universities of Posts and Telecommunications, 2010, 17 (04) : 100 - 109
[5] q-Learning in Continuous Time
Jia, Yanwei
Zhou, Xun Yu
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[6] A CONTINUOUS-ACTION VISCOMETER
PROVINTE.IV
GERSHKOV.BM
[J]. INDUSTRIAL LABORATORY, 1966, 32 (05): : 766 - &
[7] CONTINUOUS ACTION GENERATION OF Q-LEARNING IN MULTI-AGENT COOPERATION
Hwang, Kao-Shing
Chen, Yu-Jen
Jiang, Wei-Cheng
Lin, Tzung-Feng
[J]. ASIAN JOURNAL OF CONTROL, 2013, 15 (04) : 1011 - 1020
[8] Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action Tasks
Jiang, Haobo
Xie, Jin
Yang, Jian
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7979 - 7986
[9] Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks
Jiang, Haobo
Li, Guangyu
Xie, Jin
Yang, Jian
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 5269 - 5279
[10] Continuous-Action Multiplier Engineering
Yu. M. Lekontsev
P. V. Sazhin
B. L. Gerike
A. V. Novik
Yu. B. Mezentsev
[J]. Journal of Mining Science, 2023, 59 : 604 - 610

← 1 2 3 4 5 →