Fast and Epsilon-Optimal Discretized Pursuit Learning Automata

被引：33

作者：

Zhang, JunQi ^{[1
]}

Wang, Cheng ^{[1
]}

Zhou, MengChu ^{[1
,2
]}

机构：

[1] Tongji Univ, Dept Comp Sci & Technol, Key Lab Embedded Syst & Serv Comp, Minist Educ, Shanghai 200092, Peoples R China

[2] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2015年 / 45卷 / 10期

基金：

美国国家科学基金会;

关键词：

Discretized pursuit learning automata (LA); low computational complexity; stationary environments; OPTIMIZATION; ASSIGNMENT; ALGORITHM; SCHEMES; DESIGN; SYSTEM; TEAM;

D O I：

10.1109/TCYB.2014.2365463

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Learning automata (LA) are powerful tools for reinforcement learning. A discretized pursuit LA is the most popular one among them. During an iteration its operation consists of three basic phases: 1) selecting the next action; 2) finding the optimal estimated action; and 3) updating the state probability. However, when the number of actions is large, the learning becomes extremely slow because there are too many updates to be made at each iteration. The increased updates are mostly from phases 1 and 3. A new fast discretized pursuit LA with assured e-optimality is proposed to perform both phases 1 and 3 with the computational complexity independent of the number of actions. Apart from its low computational complexity, it achieves faster convergence speed than the classical one when operating in stationary environments. This paper can promote the applications of LA toward the large-scale-action oriented area that requires efficient reinforcement learning tools with assured e-optimality, fast convergence speed, and low computational complexity for each iteration.

引用

页码：2089 / 2099

页数：11

共 50 条

[1] EPSILON-OPTIMAL DISCRETIZED PURSUIT LEARNING AUTOMATA
OOMMEN, BJ
LANCTOT, JK
1989 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-3: CONFERENCE PROCEEDINGS, 1989, : 6 - 12
[2] EPSILON-OPTIMAL DISCRETIZED LINEAR REWARD-PENALTY LEARNING AUTOMATA
OOMMEN, BJ
CHRISTENSEN, JPR
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1988, 18 (03): : 451 - 458
[3] DISCRETIZED PURSUIT LEARNING AUTOMATA
OOMMEN, BJ
LANCTOT, JK
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1990, 20 (04): : 931 - 938
[4] 2 EPSILON-OPTIMAL NONLINEAR REINFORCEMENT SCHEMES FOR STOCHASTIC AUTOMATA
SAWARAGI, Y
BABA, N
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1974, SMC4 (01): : 126 - 131
[5] EPSILON-OPTIMAL STUBBORN LEARNING-MECHANISMS
CHRISTENSEN, JPR
OOMMEN, BJ
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1990, 20 (05): : 1209 - 1216
[6] An epsilon-Optimal Portfolio with Stochastic Volatility
Gabih, Abdelali
Grecksch, Wilfried
MONTE CARLO METHODS AND APPLICATIONS, 2005, 11 (01): : 1 - 14
[7] Generalized pursuit learning schemes: New families of continuous and discretized learning automata
Agache, M
Oommen, BJ
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2002, 32 (06): : 738 - 749
[8] STATIONARY EPSILON-OPTIMAL STRATEGIES IN STOCHASTIC GAMES
THUIJSMAN, F
VRIEZE, K
OR SPEKTRUM, 1993, 15 (01) : 9 - 15
[9] SOME EPSILON-OPTIMAL ROW-COLUMN DESIGNS
JACROUX, M
SANKHYA-SERIES B-APPLIED AND INTERDISCIPLINARY STATISTICS, 1986, 48 : 31 - 39
[10] ON THE CONSTRUCTION OF epsilon-OPTIMAL STRATEGIES IN PARTIALLY OBSERVED MDPs
Runggaldier, Wolfgang J.
ANNALS OF OPERATIONS RESEARCH, 1991, 28 (01) : 81 - 95

← 1 2 3 4 5 →