The Effect of Discounting Actor-loss in Actor-Critic Algorithm

被引:0
|
作者
Yaputra, Jordi [1 ]
Suyanto, Suyanto [1 ]
机构
[1] Telkom Univ, Dept Informat, Bandung, Indonesia
关键词
Reinforcement Learning; Actor-Critic; Temporal Difference Learning; Convolutional Neural Network; Artificial Intelligence; GAME;
D O I
10.1109/ISRITI54043.2021.9702883
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We analyze and present an experimental approach to see the effect of limiting the Temporal Difference (TD) error in estimating actor-loss on an actor-critic-based agent. The limitation is done by reducing the loss value of an actor to the factor of an epsilon epsilon constant. In this experiment, we chose four epsilon values, i.e., 0.01, 0.1, 0.5, and 1.0, where 1.0 means no discount at all. In the experiment, we spawn four agents to solve a trivial task for humans in a custom lightweight Windows Operating System (OS)-like simulation. Each agent receives inputs of the simulation's screen image and controls the cursor inside the simulation to reach for any rendered red circles. After 50 episodes, 50,000 steps in total, each agent achieved about the same success rate with slight differences. The agent given an epsilon value of 0.01 achieved the highest success rate, higher than one without discount learning (epsilon=1.0), although not much.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] An Actor-Critic Algorithm With Second-Order Actor and Critic
    Wang, Jing
    Paschalidis, Ioannis Ch.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (06) : 2689 - 2703
  • [2] A Hessian Actor-Critic Algorithm
    Wang, Jing
    Paschalidis, Ioannis Ch
    [J]. 2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 1131 - 1136
  • [3] An Actor-Critic Algorithm for SVM Hyperparameters
    Kim, Chayoung
    Park, Jung-min
    Kim, Hye-young
    [J]. INFORMATION SCIENCE AND APPLICATIONS 2018, ICISA 2018, 2019, 514 : 653 - 661
  • [4] Actor-critic algorithms
    Konda, VR
    Tsitsiklis, JN
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1008 - 1014
  • [5] On actor-critic algorithms
    Konda, VR
    Tsitsiklis, JN
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
  • [6] Natural Actor-Critic
    Peters, Jan
    Schaal, Stefan
    [J]. NEUROCOMPUTING, 2008, 71 (7-9) : 1180 - 1190
  • [7] Natural Actor-Critic
    Peters, J
    Vijayakumar, S
    Schaal, S
    [J]. MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 280 - 291
  • [8] A Finite Sample Analysis of the Actor-Critic Algorithm
    Yang, Zhuoran
    Zhang, Kaiqing
    Hong, Mingyi
    Basar, Tamer
    [J]. 2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 2759 - 2764
  • [9] Actor-Critic Algorithm with Transition Cost Estimation
    Sergey, Denisov
    Lee, Jee-Hyong
    [J]. INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2016, 16 (04) : 270 - 275
  • [10] A modified actor-critic reinforcement learning algorithm
    Mustapha, SM
    Lachiver, G
    [J]. 2000 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS 1 AND 2: NAVIGATING TO A NEW ERA, 2000, : 605 - 609