The Hierarchical Discrete Pursuit Learning Automaton: A Novel Scheme With Fast Convergence and Epsilon-Optimality

被引：1

作者：

Omslandseter, Rebekka Olsson ^{[1
]}

Jiao, Lei ^{[1
]}

Zhang, Xuan ^{[2
]}

Yazidi, Anis ^{[3
]}

Oommen, B. John ^{[4
,5
]}

机构：

[1] Univ Agder, Dept Informat & Commun Technol, N-4879 Grimstad, Norway

[2] Norwegian Res Ctr NORCE, N-4879 Grimstad, Norway

[3] Oslo Metropolitan Univ, Dept Comp Sci, N-0160 Oslo, Norway

[4] Carleton Univ, Sch Comp Sci, Ottawa, ON K1S 5B6, Canada

[5] Northwest Univ, TRADE Res Entity, ZA-2520 Potchefstroom, South Africa

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 06期

关键词：

Learning automata; Automata; Convergence; Reinforcement learning; Task analysis; Pursuit algorithms; Markov processes; Convergence analysis; hierarchical discrete pursuit LA; learning automata (LA); reinforcement learning (RL); ALGORITHMS; ACCESS;

D O I：

10.1109/TNNLS.2022.3226538

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Since the early 1960s, the paradigm of learning automata (LA) has experienced abundant interest. Arguably, it has also served as the foundation for the phenomenon and field of reinforcement learning (RL). Over the decades, new concepts and fundamental principles have been introduced to increase the LA's speed and accuracy. These include using probability updating functions, discretizing the probability space, and using the "Pursuit" concept. Very recently, the concept of incorporating "structure" into the ordering of the LA's actions has improved both the speed and accuracy of the corresponding hierarchical machines, when the number of actions is large. This has led to the epsilon-optimal hierarchical continuous pursuit LA (HCPA). This article pioneers the inclusion of all the above-mentioned phenomena into a new single LA, leading to the novel hierarchical discretized pursuit LA (HDPA). Indeed, although the previously proposed HCPA is powerful, its speed has an impediment when any action probability is close to unity, because the updates of the components of the probability vector are correspondingly smaller when any action probability becomes closer to unity. We propose here, the novel HDPA, where we infuse the phenomenon of discretization into the action probability vector's updating functionality, and which is invoked recursively at every stage of the machine's hierarchical structure. This discretized functionality does not possess the same impediment, because discretization prohibits it. We demonstrate the HDPA's robustness and validity by formally proving the epsilon-optimality by utilizing the moderation property. We also invoke the submartingale characteristic at every level, to prove that the action probability of the optimal action converges to unity as time goes to infinity. Apart from the new machine being epsilon-optimal, the numerical results demonstrate that the number of iterations required for convergence is significantly reduced for the HDPA, when compared to the state-of-the-art HCPA scheme.

引用

页码：8278 / 8292

页数：15

共 23 条

[1] EPSILON-OPTIMALITY OF A GENERAL-CLASS OF LEARNING ALGORITHMS
MEYBODI, MR
LAKSHMIVARAHAN, S
INFORMATION SCIENCES, 1982, 28 (01) : 1 - 20
[2] ON USING DISTRIBUTION-THEORY TO PROVE THE EPSILON-OPTIMALITY OF STUBBORN LEARNING-MECHANISMS
CHRISTENSEN, JPR
OOMMEN, BJ
1989 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-3: CONFERENCE PROCEEDINGS, 1989, : 286 - 291
[3] Fast and Epsilon-Optimal Discretized Pursuit Learning Automata
Zhang, JunQi
Wang, Cheng
Zhou, MengChu
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (10) : 2089 - 2099
[4] The Hierarchical Continuous Pursuit Learning Automation: A Novel Scheme for Environments With Large Numbers of Actions
Yazidi, Anis
Zhang, Xuan
Jiao, Lei
Oommen, B. John
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (02) : 512 - 526
[5] The Hierarchical Discrete Learning Automaton Suitable for Environments with Many Actions and High Accuracy Requirements
Omslandseter, Rebekka Olsson
Jiao, Lei
Zhang, Xuan
Yazidi, Anis
Oommen, B. John
AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13151 : 507 - 518
[6] HIERARCHICAL DISCRETIZED PURSUIT NONLINEAR LEARNING AUTOMATA WITH RAPID CONVERGENCE AND HIGH-ACCURACY
PAPADIMITRIOU, GI
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1994, 6 (04) : 654 - 659
[7] Federated Learning with Pareto Optimality for Resource Efficiency and Fast Model Convergence in Mobile Environments
Jung, June-Pyo
Ko, Young-Bae
Lim, Sung-Hwa
SENSORS, 2024, 24 (08)
[8] Distributed learning automata-based scheme for classification using novel pursuit scheme
Goodwin, Morten
Yazidi, Anis
APPLIED INTELLIGENCE, 2020, 50 (07) : 2222 - 2238
[9] Distributed learning automata-based scheme for classification using novel pursuit scheme
Morten Goodwin
Anis Yazidi
Applied Intelligence, 2020, 50 : 2222 - 2238
[10] CONVERGENCE AND ROBUSTNESS OF A DISCRETE-TIME LEARNING CONTROL SCHEME FOR CONSTRAINED MANIPULATORS
CHEAH, CC
WANG, DW
SOH, YC
JOURNAL OF ROBOTIC SYSTEMS, 1994, 11 (03): : 223 - 238

← 1 2 3 →