EPSILON-OPTIMAL DISCRETIZED LINEAR REWARD-PENALTY LEARNING AUTOMATA

被引:43
|
作者
OOMMEN, BJ [1 ]
CHRISTENSEN, JPR [1 ]
机构
[1] UNIV COPENHAGEN,INST MATH,DK-2100 COPENHAGEN,DENMARK
来源
关键词
PROBABILITY - SYSTEMS SCIENCE AND CYBERNETICS -- Learning Systems;
D O I
10.1109/21.7494
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Variable-structure stochastic automata (VSSA) are considered which interact with an environment and which dynamically learn the optimal action that the environment offers. Like all VSSA the automata are fully defined by a set of action-probability updating rules. However, to minimize the requirements on the random-number generator used to implement the VSSA, and to increase the speed of convergence of the automaton, the case in which the probability-updating functions can assume only a finite number of values. These values discretize the probability space [0, 1] and hence they are called discretized learning automata. The discretized automata are linear because the subintervals of [0, 1] are of equal length. The authors prove the following results: (a) two-action discretized linear reward-penalty automata are ergodic and Ε-optimal in all environments whose minimum penalty probability is less than 0.5; (b) there exist discretized two-action linear reward-penalty automata that are ergodic and Ε-optimal in all random environments, and (c) discretized two-action linear reward-penalty automata with artifically created absorbing barriers are Ε-optimal in all random environments.
引用
收藏
页码:451 / 458
页数:8
相关论文
共 21 条
  • [1] EPSILON-OPTIMAL DISCRETIZED PURSUIT LEARNING AUTOMATA
    OOMMEN, BJ
    LANCTOT, JK
    1989 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-3: CONFERENCE PROCEEDINGS, 1989, : 6 - 12
  • [2] Fast and Epsilon-Optimal Discretized Pursuit Learning Automata
    Zhang, JunQi
    Wang, Cheng
    Zhou, MengChu
    IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (10) : 2089 - 2099
  • [3] AntNet with Reward-Penalty Reinforcement Learning
    Lalbakhsh, Pooia
    Zaeri, Bahram
    Lalbakhsh, Ali
    Fesharaki, Mehdi N.
    2010 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, COMMUNICATION SYSTEMS AND NETWORKS (CICSYN), 2010, : 17 - 21
  • [4] THE ASYMPTOTIC OPTIMALITY OF DISCRETIZED LINEAR REWARD INACTION LEARNING AUTOMATA
    OOMMEN, BJ
    HANSEN, E
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1984, 14 (03): : 542 - 545
  • [5] 2 EPSILON-OPTIMAL NONLINEAR REINFORCEMENT SCHEMES FOR STOCHASTIC AUTOMATA
    SAWARAGI, Y
    BABA, N
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1974, SMC4 (01): : 126 - 131
  • [6] EPSILON-OPTIMAL STUBBORN LEARNING-MECHANISMS
    CHRISTENSEN, JPR
    OOMMEN, BJ
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1990, 20 (05): : 1209 - 1216
  • [7] Reward-penalty reinforcement learning scheme for planning and reactive behavior
    Araujo, AFR
    Braga, APS
    1998 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5, 1998, : 1485 - 1490
  • [8] EPSILON-OPTIMAL AND OPTIMAL CONTROLS FOR THE STOCHASTIC LINEAR-QUADRATIC PROBLEM
    TUDOR, C
    MATHEMATISCHE NACHRICHTEN, 1990, 145 : 135 - 149
  • [9] Learning Non-Unique Segmentation with Reward-Penalty Dice Loss
    He, Jiabo
    Erfani, Sarah
    Wijewickrema, Sudanthi
    O'Leary, Stephen
    Ramamohanarao, Kotagiri
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [10] EPSILON-OPTIMAL CONTROL OF A LINEAR PARABOLIC ITO-EQUATION
    GRECKSCH, W
    MATHEMATISCHE NACHRICHTEN, 1987, 134 : 7 - 20