Selector-Actor-Critic and Tuner-Actor-Critic Algorithms for Reinforcement Learning

被引：0

作者：

Masadeh, Ala'eddin ^{[1
]}

Wang, Zhengdao ^{[1
]}

Kamal, Ahmed E. ^{[1
]}

机构：

[1] ISU, Ames, IA 50011 USA

来源：

2019 11TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP) | 2019年

基金：

美国国家科学基金会;

关键词：

Reinforcement learning; model-based learning; model-free learning; actor-critic; GAME; GO;

D O I：

10.1109/wcsp.2019.8928124

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This work presents two reinforcement learning (RL) architectures, which mimic rational humans in the way of analyzing the available information and making decisions. The proposed algorithms are called selector-actor-critic (SAC) and tuner-actor-critic (TAC). They are obtained by modifying the well known actor-critic (AC) algorithm. SAC is equipped with an actor, a critic, and a selector. The role of the selector is to determine the most promising action at the current state based on the last estimate from the critic. TAC is model based, and consists of a tuner, a model-learner, an actor, and a critic. After receiving the approximated value of the current state-action pair from the critic and the learned model from the model-learner, the tuner uses the Bellman equation to tune the value of the current state-action pair. Then, this tuned value is used by the actor to optimize the policy. We investigate the performance of the proposed algorithms, and compare with AC algorithm to show the advantages of the proposed algorithms using numerical simulations.

引用

页数：6

共 50 条

[1] A World Model for Actor–Critic in Reinforcement Learning
A. I. Panov
L. A. Ugadiarov
[J]. Pattern Recognition and Image Analysis, 2023, 33 : 467 - 477
[2] Multi-actor mechanism for actor-critic reinforcement learning
Li, Lin
Li, Yuze
Wei, Wei
Zhang, Yujia
Liang, Jiye
[J]. INFORMATION SCIENCES, 2023, 647
[3] Actor-critic algorithms
Konda, VR
Tsitsiklis, JN
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1008 - 1014
[4] On actor-critic algorithms
Konda, VR
Tsitsiklis, JN
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
[5] Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning
Gurumurthy, Swaminathan
Manchester, Zachary
Kolter, J. Zico
[J]. LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
[6] A World Model for Actor-Critic in Reinforcement Learning
Panov, A. I.
Ugadiarov, L. A.
[J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
[7] Actor-Critic based Improper Reinforcement Learning
Zaki, Mohammadi
Mohan, Avinash
Gopalan, Aditya
Mannor, Shie
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[8] Curious Hierarchical Actor-Critic Reinforcement Learning
Roeder, Frank
Eppe, Manfred
Nguyen, Phuong D. H.
Wermter, Stefan
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 408 - 419
[9] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
Diddigi, Raghuram Bharadwaj
Reddy, D. Sai Koti
Prabuchandran, K. J.
Bhatnagar, Shalabh
[J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
[10] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
Prashant Trivedi
Nandyala Hemachandra
[J]. Dynamic Games and Applications, 2023, 13 : 25 - 55

← 1 2 3 4 5 →