Actor-Critic Algorithm with Transition Cost Estimation

被引:0
|
作者
Sergey, Denisov [1 ]
Lee, Jee-Hyong [1 ]
机构
[1] Sungkyunkwan Univ, Dept Elect & Comp Engn, Suwon, South Korea
基金
新加坡国家研究基金会;
关键词
Actor-critic algorithm; Reinforcement learning; Continuous action space; Heuristic function;
D O I
10.5391/IJFIS.2016.16.4.270
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present an approach for acceleration actor-critic algorithm for reinforcement learning with continuous action space. Actor-critic algorithm has already proved its robustness to the infinitely large action spaces in various high dimensional environments. Despite that success, the main problem of the actor-critic algorithm remains the same-speed of convergence to the optimal policy. In high dimensional state and action space, a searching for the correct action in each state takes enormously long time. Therefore, in this paper we suggest a search accelerating function that allows to leverage speed of algorithm convergence and reach optimal policy faster. In our method, we assume that actions may have their own distribution of preference, that independent on the state. Since in the beginning of learning agent act randomly in the environment, it would be more efficient if actions were taken according to the some heuristic function. We demonstrate that heuristically-accelerated actor-critic algorithm learns optimal policy faster, using Educational Process Mining dataset with records of students' course learning process and their grades.
引用
收藏
页码:270 / 275
页数:6
相关论文
共 50 条
  • [1] A Hessian Actor-Critic Algorithm
    Wang, Jing
    Paschalidis, Ioannis Ch
    [J]. 2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 1131 - 1136
  • [2] An Actor-Critic Algorithm With Second-Order Actor and Critic
    Wang, Jing
    Paschalidis, Ioannis Ch.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (06) : 2689 - 2703
  • [3] THE ACTOR-CRITIC ALGORITHM FOR INFINITE HORIZON DISCOUNTED COST REVISITED
    Gosavi, Abhijit
    [J]. 2020 WINTER SIMULATION CONFERENCE (WSC), 2020, : 2867 - 2878
  • [4] An Actor-Critic Algorithm for SVM Hyperparameters
    Kim, Chayoung
    Park, Jung-min
    Kim, Hye-young
    [J]. INFORMATION SCIENCE AND APPLICATIONS 2018, ICISA 2018, 2019, 514 : 653 - 661
  • [5] A sensitivity formula for risk-sensitive cost and the actor-critic algorithm
    Borkar, VS
    [J]. SYSTEMS & CONTROL LETTERS, 2001, 44 (05) : 339 - 346
  • [6] A Finite Sample Analysis of the Actor-Critic Algorithm
    Yang, Zhuoran
    Zhang, Kaiqing
    Hong, Mingyi
    Basar, Tamer
    [J]. 2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 2759 - 2764
  • [7] Adaptive Advantage Estimation for Actor-Critic Algorithms
    Chen, Yurou
    Zhang, Fengyi
    Liu, Zhiyong
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [8] The Effect of Discounting Actor-loss in Actor-Critic Algorithm
    Yaputra, Jordi
    Suyanto, Suyanto
    [J]. 2021 4TH INTERNATIONAL SEMINAR ON RESEARCH OF INFORMATION TECHNOLOGY AND INTELLIGENT SYSTEMS (ISRITI 2021), 2020,
  • [9] A modified actor-critic reinforcement learning algorithm
    Mustapha, SM
    Lachiver, G
    [J]. 2000 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS 1 AND 2: NAVIGATING TO A NEW ERA, 2000, : 605 - 609
  • [10] Actor-critic algorithms
    Konda, VR
    Tsitsiklis, JN
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1008 - 1014