Fast and Epsilon-Optimal Discretized Pursuit Learning Automata

被引：33

作者：

Zhang, JunQi ^{[1
]}

Wang, Cheng ^{[1
]}

Zhou, MengChu ^{[1
,2
]}

机构：

[1] Tongji Univ, Dept Comp Sci & Technol, Key Lab Embedded Syst & Serv Comp, Minist Educ, Shanghai 200092, Peoples R China

[2] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2015年 / 45卷 / 10期

基金：

美国国家科学基金会;

关键词：

Discretized pursuit learning automata (LA); low computational complexity; stationary environments; OPTIMIZATION; ASSIGNMENT; ALGORITHM; SCHEMES; DESIGN; SYSTEM; TEAM;

D O I：

10.1109/TCYB.2014.2365463

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Learning automata (LA) are powerful tools for reinforcement learning. A discretized pursuit LA is the most popular one among them. During an iteration its operation consists of three basic phases: 1) selecting the next action; 2) finding the optimal estimated action; and 3) updating the state probability. However, when the number of actions is large, the learning becomes extremely slow because there are too many updates to be made at each iteration. The increased updates are mostly from phases 1 and 3. A new fast discretized pursuit LA with assured e-optimality is proposed to perform both phases 1 and 3 with the computational complexity independent of the number of actions. Apart from its low computational complexity, it achieves faster convergence speed than the classical one when operating in stationary environments. This paper can promote the applications of LA toward the large-scale-action oriented area that requires efficient reinforcement learning tools with assured e-optimality, fast convergence speed, and low computational complexity for each iteration.

引用

页码：2089 / 2099

页数：11

共 50 条

[31] MAXIMAL REWARDS AND EPSILON-OPTIMAL POLICIES IN CONTINUOUS TIME MARKOV DECISION CHAINS
LEMBERSKY, MR
ANNALS OF STATISTICS, 1974, 2 (01): : 159 - 169
[32] Random Early Detection for Congestion Avoidance in Wired Networks: A Discretized Pursuit Learning-Automata-Like Solution
Misra, Sudip
Oommen, B. John
Yanamandra, Sreekeerthy
Obaidat, Mohammad S.
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2010, 40 (01): : 66 - 76
[33] AN EPSILON-OPTIMAL CONTROL OF A FINITE MARKOV-CHAIN WITH AN AVERAGE REWARD CRITERION
FAINBERG, EA
THEORY OF PROBABILITY AND ITS APPLICATIONS, 1980, 25 (01) : 70 - 81
[34] EPSILON-OPTIMAL SOLUTIONS IN NONDIFFERENTIABLE CONVEX-PROGRAMMING AND SOME RELATED QUESTIONS
STRODIOT, JJ
NGUYEN, VH
HEUKEMES, N
MATHEMATICAL PROGRAMMING, 1983, 25 (03) : 307 - 328
[35] APPROXIMATIONS FOR DISCRETE-TIME ADAPTIVE-CONTROL - CONSTRUCTION OF EPSILON-OPTIMAL CONTROLS
RUNGGALDIER, WJ
ZANE, O
MATHEMATICS OF CONTROL SIGNALS AND SYSTEMS, 1991, 4 (03) : 269 - 291
[36] epsilon-Optimal Minimal-Skew Battery Lifetime Routing in Distributed Embedded Systems
Jafari, Roozbeh
Dabiri, Foad
Sarrafzadeh, Majid
JOURNAL OF LOW POWER ELECTRONICS, 2005, 1 (02) : 97 - 107
[37] NONEXISTENCE OF EPSILON-OPTIMAL RANDOMIZED STATIONARY POLICIES IN AVERAGE COST MARKOV DECISION MODELS
ROSS, SM
ANNALS OF MATHEMATICAL STATISTICS, 1971, 42 (05): : 1767 - &
[38] Guaranteed Epsilon-Optimal Treatment Plans with Minimum Number of Beams for SBRT Using RayStation
Yarmand, H.
Winey, B.
Craft, D.
MEDICAL PHYSICS, 2014, 41 (06) : 396 - 396
[39] The Hierarchical Discrete Pursuit Learning Automaton: A Novel Scheme With Fast Convergence and Epsilon-Optimality
Omslandseter, Rebekka Olsson
Jiao, Lei
Zhang, Xuan
Yazidi, Anis
Oommen, B. John
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (06) : 8278 - 8292
[40] An epsilon-optimal algorithm considering greenhouse gas emissions for the management of a ship's bunker fuel
Kim, Hwa-Joong
Chang, Young-Tae
Kim, Kwang-Tae
Kim, Hyo-Jeong
TRANSPORTATION RESEARCH PART D-TRANSPORT AND ENVIRONMENT, 2012, 17 (02) : 97 - 103

← 1 2 3 4 5 →