The Hierarchical Discrete Pursuit Learning Automaton: A Novel Scheme With Fast Convergence and Epsilon-Optimality

被引:1
|
作者
Omslandseter, Rebekka Olsson [1 ]
Jiao, Lei [1 ]
Zhang, Xuan [2 ]
Yazidi, Anis [3 ]
Oommen, B. John [4 ,5 ]
机构
[1] Univ Agder, Dept Informat & Commun Technol, N-4879 Grimstad, Norway
[2] Norwegian Res Ctr NORCE, N-4879 Grimstad, Norway
[3] Oslo Metropolitan Univ, Dept Comp Sci, N-0160 Oslo, Norway
[4] Carleton Univ, Sch Comp Sci, Ottawa, ON K1S 5B6, Canada
[5] Northwest Univ, TRADE Res Entity, ZA-2520 Potchefstroom, South Africa
关键词
Learning automata; Automata; Convergence; Reinforcement learning; Task analysis; Pursuit algorithms; Markov processes; Convergence analysis; hierarchical discrete pursuit LA; learning automata (LA); reinforcement learning (RL); ALGORITHMS; ACCESS;
D O I
10.1109/TNNLS.2022.3226538
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since the early 1960s, the paradigm of learning automata (LA) has experienced abundant interest. Arguably, it has also served as the foundation for the phenomenon and field of reinforcement learning (RL). Over the decades, new concepts and fundamental principles have been introduced to increase the LA's speed and accuracy. These include using probability updating functions, discretizing the probability space, and using the "Pursuit" concept. Very recently, the concept of incorporating "structure" into the ordering of the LA's actions has improved both the speed and accuracy of the corresponding hierarchical machines, when the number of actions is large. This has led to the epsilon-optimal hierarchical continuous pursuit LA (HCPA). This article pioneers the inclusion of all the above-mentioned phenomena into a new single LA, leading to the novel hierarchical discretized pursuit LA (HDPA). Indeed, although the previously proposed HCPA is powerful, its speed has an impediment when any action probability is close to unity, because the updates of the components of the probability vector are correspondingly smaller when any action probability becomes closer to unity. We propose here, the novel HDPA, where we infuse the phenomenon of discretization into the action probability vector's updating functionality, and which is invoked recursively at every stage of the machine's hierarchical structure. This discretized functionality does not possess the same impediment, because discretization prohibits it. We demonstrate the HDPA's robustness and validity by formally proving the epsilon-optimality by utilizing the moderation property. We also invoke the submartingale characteristic at every level, to prove that the action probability of the optimal action converges to unity as time goes to infinity. Apart from the new machine being epsilon-optimal, the numerical results demonstrate that the number of iterations required for convergence is significantly reduced for the HDPA, when compared to the state-of-the-art HCPA scheme.
引用
收藏
页码:8278 / 8292
页数:15
相关论文
共 23 条
  • [1] EPSILON-OPTIMALITY OF A GENERAL-CLASS OF LEARNING ALGORITHMS
    MEYBODI, MR
    LAKSHMIVARAHAN, S
    INFORMATION SCIENCES, 1982, 28 (01) : 1 - 20
  • [2] ON USING DISTRIBUTION-THEORY TO PROVE THE EPSILON-OPTIMALITY OF STUBBORN LEARNING-MECHANISMS
    CHRISTENSEN, JPR
    OOMMEN, BJ
    1989 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-3: CONFERENCE PROCEEDINGS, 1989, : 286 - 291
  • [3] Fast and Epsilon-Optimal Discretized Pursuit Learning Automata
    Zhang, JunQi
    Wang, Cheng
    Zhou, MengChu
    IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (10) : 2089 - 2099
  • [4] The Hierarchical Continuous Pursuit Learning Automation: A Novel Scheme for Environments With Large Numbers of Actions
    Yazidi, Anis
    Zhang, Xuan
    Jiao, Lei
    Oommen, B. John
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (02) : 512 - 526
  • [5] The Hierarchical Discrete Learning Automaton Suitable for Environments with Many Actions and High Accuracy Requirements
    Omslandseter, Rebekka Olsson
    Jiao, Lei
    Zhang, Xuan
    Yazidi, Anis
    Oommen, B. John
    AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13151 : 507 - 518
  • [6] HIERARCHICAL DISCRETIZED PURSUIT NONLINEAR LEARNING AUTOMATA WITH RAPID CONVERGENCE AND HIGH-ACCURACY
    PAPADIMITRIOU, GI
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1994, 6 (04) : 654 - 659
  • [7] Federated Learning with Pareto Optimality for Resource Efficiency and Fast Model Convergence in Mobile Environments
    Jung, June-Pyo
    Ko, Young-Bae
    Lim, Sung-Hwa
    SENSORS, 2024, 24 (08)
  • [8] Distributed learning automata-based scheme for classification using novel pursuit scheme
    Goodwin, Morten
    Yazidi, Anis
    APPLIED INTELLIGENCE, 2020, 50 (07) : 2222 - 2238
  • [9] Distributed learning automata-based scheme for classification using novel pursuit scheme
    Morten Goodwin
    Anis Yazidi
    Applied Intelligence, 2020, 50 : 2222 - 2238
  • [10] CONVERGENCE AND ROBUSTNESS OF A DISCRETE-TIME LEARNING CONTROL SCHEME FOR CONSTRAINED MANIPULATORS
    CHEAH, CC
    WANG, DW
    SOH, YC
    JOURNAL OF ROBOTIC SYSTEMS, 1994, 11 (03): : 223 - 238