EPSILON-OPTIMAL DISCRETIZED LINEAR REWARD-PENALTY LEARNING AUTOMATA

被引:43
|
作者
OOMMEN, BJ [1 ]
CHRISTENSEN, JPR [1 ]
机构
[1] UNIV COPENHAGEN,INST MATH,DK-2100 COPENHAGEN,DENMARK
来源
关键词
PROBABILITY - SYSTEMS SCIENCE AND CYBERNETICS -- Learning Systems;
D O I
10.1109/21.7494
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Variable-structure stochastic automata (VSSA) are considered which interact with an environment and which dynamically learn the optimal action that the environment offers. Like all VSSA the automata are fully defined by a set of action-probability updating rules. However, to minimize the requirements on the random-number generator used to implement the VSSA, and to increase the speed of convergence of the automaton, the case in which the probability-updating functions can assume only a finite number of values. These values discretize the probability space [0, 1] and hence they are called discretized learning automata. The discretized automata are linear because the subintervals of [0, 1] are of equal length. The authors prove the following results: (a) two-action discretized linear reward-penalty automata are ergodic and Ε-optimal in all environments whose minimum penalty probability is less than 0.5; (b) there exist discretized two-action linear reward-penalty automata that are ergodic and Ε-optimal in all random environments, and (c) discretized two-action linear reward-penalty automata with artifically created absorbing barriers are Ε-optimal in all random environments.
引用
收藏
页码:451 / 458
页数:8
相关论文
共 21 条
  • [12] Optimal design of reward-penalty demand response programs in smart power grids
    Ghorashi, Seyed Morteza
    Rastegar, Mohammad
    Senemmar, Soroush
    Seifi, Ali Reza
    SUSTAINABLE CITIES AND SOCIETY, 2020, 60
  • [13] Optimal design of reward-penalty demand response programs in smart power grids
    Ghorashi, Seyed Morteza
    Rastegar, Mohammad
    Senemmar, Soroush
    Seifi, Ali Reza
    SUSTAINABLE CITIES AND SOCIETY, 2020, 60 (60)
  • [14] EPSILON-OPTIMAL PLANS FOR A NONSTATIONARY DYNAMIC-PROGRAMMING MODEL WITH GENERAL TOTAL REWARD FUNCTION
    KERTZ, RP
    ADVANCES IN APPLIED PROBABILITY, 1978, 10 (02) : 310 - 311
  • [15] Handling polysemous triggers and arguments in event extraction: an adaptive semantics learning strategy with reward-penalty mechanism
    Li, Haili
    Tian, Zhiliang
    Wang, Xiaodong
    Zhou, Yunyan
    Pan, Shilong
    Zhou, Jie
    Xu, Qiubo
    Li, Dongsheng
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2025,
  • [16] A POLYNOMIAL ASYMPTOTICALLY EPSILON-OPTIMAL ALGORITHM OF A RANDOM SEARCH FOR THE MULTIDIMENSIONAL PROBLEM OF THE BOOLEAN LINEAR-PROGRAMMING
    SERGIENKO, IV
    SHILO, VP
    DOPOVIDI AKADEMII NAUK UKRAINSKOI RSR SERIYA A-FIZIKO-MATEMATICHNI TA TECHNICHNI NAUKI, 1987, (03): : 68 - 70
  • [17] Analysis of Evolution Mechanism and Optimal Reward-Penalty Mechanism for Collection Strategies in Reverse Supply Chains: The Case of Waste Mobile Phones in China
    Ding, Yangke
    Ma, Lei
    Zhang, Ye
    Feng, Dingzhong
    SUSTAINABILITY, 2018, 10 (12)
  • [18] Tax-subsidy or reward-penalty? Determining optimal strategy in sustainable closed-loop supply chain under quality-dependent return
    Mondal, Chirantan
    Giri, Bibhas C.
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE-OPERATIONS & LOGISTICS, 2023, 10 (01)
  • [19] Adaptive selection of the optimal order of linear regression models using learning automata
    Poznyak, AS
    Najim, K
    Ikonen, E
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 1996, 27 (02) : 151 - 159
  • [20] Optimal recycling model selection in a closed-loop supply chain for electric vehicle batteries under carbon cap-trade and reward-penalty policies using the Stackelberg game
    Narang, Pankaj
    De, Pijus Kanti
    Lim, Chee Peng
    Kumari, Mamta
    COMPUTERS & INDUSTRIAL ENGINEERING, 2024, 196