The Effect of Discounting Actor-loss in Actor-Critic Algorithm

被引：0

作者：

Yaputra, Jordi ^{[1
]}

Suyanto, Suyanto ^{[1
]}

机构：

[1] Telkom Univ, Dept Informat, Bandung, Indonesia

来源：

2021 4TH INTERNATIONAL SEMINAR ON RESEARCH OF INFORMATION TECHNOLOGY AND INTELLIGENT SYSTEMS (ISRITI 2021) | 2020年

关键词：

Reinforcement Learning; Actor-Critic; Temporal Difference Learning; Convolutional Neural Network; Artificial Intelligence; GAME;

D O I：

10.1109/ISRITI54043.2021.9702883

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We analyze and present an experimental approach to see the effect of limiting the Temporal Difference (TD) error in estimating actor-loss on an actor-critic-based agent. The limitation is done by reducing the loss value of an actor to the factor of an epsilon epsilon constant. In this experiment, we chose four epsilon values, i.e., 0.01, 0.1, 0.5, and 1.0, where 1.0 means no discount at all. In the experiment, we spawn four agents to solve a trivial task for humans in a custom lightweight Windows Operating System (OS)-like simulation. Each agent receives inputs of the simulation's screen image and controls the cursor inside the simulation to reach for any rendered red circles. After 50 episodes, 50,000 steps in total, each agent achieved about the same success rate with slight differences. The agent given an epsilon value of 0.01 achieved the highest success rate, higher than one without discount learning (epsilon=1.0), although not much.

引用

页数：6

共 50 条

[1] An Actor-Critic Algorithm With Second-Order Actor and Critic
Wang, Jing
Paschalidis, Ioannis Ch.
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (06) : 2689 - 2703
[2] A Hessian Actor-Critic Algorithm
Wang, Jing
Paschalidis, Ioannis Ch
[J]. 2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 1131 - 1136
[3] An Actor-Critic Algorithm for SVM Hyperparameters
Kim, Chayoung
Park, Jung-min
Kim, Hye-young
[J]. INFORMATION SCIENCE AND APPLICATIONS 2018, ICISA 2018, 2019, 514 : 653 - 661
[4] Actor-critic algorithms
Konda, VR
Tsitsiklis, JN
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1008 - 1014
[5] On actor-critic algorithms
Konda, VR
Tsitsiklis, JN
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
[6] Natural Actor-Critic
Peters, Jan
Schaal, Stefan
[J]. NEUROCOMPUTING, 2008, 71 (7-9) : 1180 - 1190
[7] Natural Actor-Critic
Peters, J
Vijayakumar, S
Schaal, S
[J]. MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 280 - 291
[8] A Finite Sample Analysis of the Actor-Critic Algorithm
Yang, Zhuoran
Zhang, Kaiqing
Hong, Mingyi
Basar, Tamer
[J]. 2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 2759 - 2764
[9] Actor-Critic Algorithm with Transition Cost Estimation
Sergey, Denisov
Lee, Jee-Hyong
[J]. INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2016, 16 (04) : 270 - 275
[10] A modified actor-critic reinforcement learning algorithm
Mustapha, SM
Lachiver, G
[J]. 2000 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS 1 AND 2: NAVIGATING TO A NEW ERA, 2000, : 605 - 609

← 1 2 3 4 5 →