EPSILON-OPTIMAL DISCRETIZED LINEAR REWARD-PENALTY LEARNING AUTOMATA

被引：43

作者：

OOMMEN, BJ ^{[1
]}

CHRISTENSEN, JPR ^{[1
]}

机构：

[1] UNIV COPENHAGEN,INST MATH,DK-2100 COPENHAGEN,DENMARK

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS | 1988年 / 18卷 / 03期

关键词：

PROBABILITY - SYSTEMS SCIENCE AND CYBERNETICS -- Learning Systems;

D O I：

10.1109/21.7494

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Variable-structure stochastic automata (VSSA) are considered which interact with an environment and which dynamically learn the optimal action that the environment offers. Like all VSSA the automata are fully defined by a set of action-probability updating rules. However, to minimize the requirements on the random-number generator used to implement the VSSA, and to increase the speed of convergence of the automaton, the case in which the probability-updating functions can assume only a finite number of values. These values discretize the probability space [0, 1] and hence they are called discretized learning automata. The discretized automata are linear because the subintervals of [0, 1] are of equal length. The authors prove the following results: (a) two-action discretized linear reward-penalty automata are ergodic and Ε-optimal in all environments whose minimum penalty probability is less than 0.5; (b) there exist discretized two-action linear reward-penalty automata that are ergodic and Ε-optimal in all random environments, and (c) discretized two-action linear reward-penalty automata with artifically created absorbing barriers are Ε-optimal in all random environments.

引用

页码：451 / 458

页数：8

共 21 条

[1] EPSILON-OPTIMAL DISCRETIZED PURSUIT LEARNING AUTOMATA
OOMMEN, BJ
LANCTOT, JK
1989 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-3: CONFERENCE PROCEEDINGS, 1989, : 6 - 12
[2] Fast and Epsilon-Optimal Discretized Pursuit Learning Automata
Zhang, JunQi
Wang, Cheng
Zhou, MengChu
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (10) : 2089 - 2099
[3] AntNet with Reward-Penalty Reinforcement Learning
Lalbakhsh, Pooia
Zaeri, Bahram
Lalbakhsh, Ali
Fesharaki, Mehdi N.
2010 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, COMMUNICATION SYSTEMS AND NETWORKS (CICSYN), 2010, : 17 - 21
[4] THE ASYMPTOTIC OPTIMALITY OF DISCRETIZED LINEAR REWARD INACTION LEARNING AUTOMATA
OOMMEN, BJ
HANSEN, E
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1984, 14 (03): : 542 - 545
[5] 2 EPSILON-OPTIMAL NONLINEAR REINFORCEMENT SCHEMES FOR STOCHASTIC AUTOMATA
SAWARAGI, Y
BABA, N
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1974, SMC4 (01): : 126 - 131
[6] EPSILON-OPTIMAL STUBBORN LEARNING-MECHANISMS
CHRISTENSEN, JPR
OOMMEN, BJ
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1990, 20 (05): : 1209 - 1216
[7] Reward-penalty reinforcement learning scheme for planning and reactive behavior
Araujo, AFR
Braga, APS
1998 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5, 1998, : 1485 - 1490
[8] EPSILON-OPTIMAL AND OPTIMAL CONTROLS FOR THE STOCHASTIC LINEAR-QUADRATIC PROBLEM
TUDOR, C
MATHEMATISCHE NACHRICHTEN, 1990, 145 : 135 - 149
[9] Learning Non-Unique Segmentation with Reward-Penalty Dice Loss
He, Jiabo
Erfani, Sarah
Wijewickrema, Sudanthi
O'Leary, Stephen
Ramamohanarao, Kotagiri
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[10] EPSILON-OPTIMAL CONTROL OF A LINEAR PARABOLIC ITO-EQUATION
GRECKSCH, W
MATHEMATISCHE NACHRICHTEN, 1987, 134 : 7 - 20

← 1 2 3 →