Fast and Epsilon-Optimal Discretized Pursuit Learning Automata

被引:33
|
作者
Zhang, JunQi [1 ]
Wang, Cheng [1 ]
Zhou, MengChu [1 ,2 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Key Lab Embedded Syst & Serv Comp, Minist Educ, Shanghai 200092, Peoples R China
[2] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA
基金
美国国家科学基金会;
关键词
Discretized pursuit learning automata (LA); low computational complexity; stationary environments; OPTIMIZATION; ASSIGNMENT; ALGORITHM; SCHEMES; DESIGN; SYSTEM; TEAM;
D O I
10.1109/TCYB.2014.2365463
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning automata (LA) are powerful tools for reinforcement learning. A discretized pursuit LA is the most popular one among them. During an iteration its operation consists of three basic phases: 1) selecting the next action; 2) finding the optimal estimated action; and 3) updating the state probability. However, when the number of actions is large, the learning becomes extremely slow because there are too many updates to be made at each iteration. The increased updates are mostly from phases 1 and 3. A new fast discretized pursuit LA with assured e-optimality is proposed to perform both phases 1 and 3 with the computational complexity independent of the number of actions. Apart from its low computational complexity, it achieves faster convergence speed than the classical one when operating in stationary environments. This paper can promote the applications of LA toward the large-scale-action oriented area that requires efficient reinforcement learning tools with assured e-optimality, fast convergence speed, and low computational complexity for each iteration.
引用
收藏
页码:2089 / 2099
页数:11
相关论文
共 50 条
  • [1] EPSILON-OPTIMAL DISCRETIZED PURSUIT LEARNING AUTOMATA
    OOMMEN, BJ
    LANCTOT, JK
    1989 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-3: CONFERENCE PROCEEDINGS, 1989, : 6 - 12
  • [2] EPSILON-OPTIMAL DISCRETIZED LINEAR REWARD-PENALTY LEARNING AUTOMATA
    OOMMEN, BJ
    CHRISTENSEN, JPR
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1988, 18 (03): : 451 - 458
  • [3] DISCRETIZED PURSUIT LEARNING AUTOMATA
    OOMMEN, BJ
    LANCTOT, JK
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1990, 20 (04): : 931 - 938
  • [4] 2 EPSILON-OPTIMAL NONLINEAR REINFORCEMENT SCHEMES FOR STOCHASTIC AUTOMATA
    SAWARAGI, Y
    BABA, N
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1974, SMC4 (01): : 126 - 131
  • [5] EPSILON-OPTIMAL STUBBORN LEARNING-MECHANISMS
    CHRISTENSEN, JPR
    OOMMEN, BJ
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1990, 20 (05): : 1209 - 1216
  • [6] An epsilon-Optimal Portfolio with Stochastic Volatility
    Gabih, Abdelali
    Grecksch, Wilfried
    MONTE CARLO METHODS AND APPLICATIONS, 2005, 11 (01): : 1 - 14
  • [7] Generalized pursuit learning schemes: New families of continuous and discretized learning automata
    Agache, M
    Oommen, BJ
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2002, 32 (06): : 738 - 749
  • [8] STATIONARY EPSILON-OPTIMAL STRATEGIES IN STOCHASTIC GAMES
    THUIJSMAN, F
    VRIEZE, K
    OR SPEKTRUM, 1993, 15 (01) : 9 - 15
  • [9] SOME EPSILON-OPTIMAL ROW-COLUMN DESIGNS
    JACROUX, M
    SANKHYA-SERIES B-APPLIED AND INTERDISCIPLINARY STATISTICS, 1986, 48 : 31 - 39
  • [10] ON THE CONSTRUCTION OF epsilon-OPTIMAL STRATEGIES IN PARTIALLY OBSERVED MDPs
    Runggaldier, Wolfgang J.
    ANNALS OF OPERATIONS RESEARCH, 1991, 28 (01) : 81 - 95