Actor-Critic Learning Control With Regularization and Feature Selection in Policy Gradient Estimation

被引:15
|
作者
Li, Luntong [1 ]
Li, Dazi [1 ]
Song, Tianheng [1 ]
Xu, Xin [2 ]
机构
[1] Beijing Univ Chem Technol, Dept Automat, Beijing 100029, Peoples R China
[2] Natl Univ Def Technol, Coll Intelligence Sci & Technol, Changsha 410073, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
l(1)-regularization; actor-critic (AC); policy gradient; regularized dual-averaging (RDA); reinforcement learning (RL);
D O I
10.1109/TNNLS.2020.2981377
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Actor-critic (AC) learning control architecture has been regarded as an important framework for reinforcement learning (RL) with continuous states and actions. In order to improve learning efficiency and convergence property, previous works have been mainly devoted to solve regularization and feature learning problem in the policy evaluation. In this article, we propose a novel AC learning control method with regularization and feature selection for policy gradient estimation in the actor network. The main contribution is that l(1)-regularization is used on the actor network to achieve the function of feature selection. In each iteration, policy parameters are updated by the regularized dual-averaging (RDA) technique, which solves a minimization problem that involves two terms: one is the running average of the past policy gradients and the other is the l(1)-regularization term of policy parameters. Our algorithm can efficiently calculate the solution of the minimization problem, and we call the new adaptation of policy gradient RDApolicy gradient (RDA-PG). The proposed RDA-PG can learn stochastic and deterministic near-optimal policies. The convergence of the proposed algorithm is established based on the theory of two-timescale stochastic approximation. The simulation and experimental results show that RDA-PG performs feature selection successfully in the actor and learns sparse representations of the actor both in stochastic and deterministic cases. RDA-PG performs better than existing AC algorithms on standard RL benchmark problems with irrelevant features or redundant features.
引用
收藏
页码:1217 / 1227
页数:11
相关论文
共 50 条
  • [1] Bayesian Policy Gradient and Actor-Critic Algorithms
    Ghavamzadeh, Mohammad
    Engel, Yaakov
    Valko, Michal
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [2] Characterizing the Gap Between Actor-Critic and Policy Gradient
    Wen, Junfeng
    Kumar, Saurabh
    Gummadi, Ramki
    Schuurmans, Dale
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [3] Policy-Gradient Based Actor-Critic Algorithms
    Awate, Yogesh P.
    [J]. PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL III, 2009, : 505 - 509
  • [4] Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
    Jia, Yanwei
    Zhou, Xun Yu
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [5] Soft-Robust Actor-Critic Policy-Gradient
    Derman, Esther
    Mankowitz, Daniel J.
    Mann, Timothy A.
    Mannor, Shie
    [J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 208 - 218
  • [6] Actor-critic algorithm with incremental dual natural policy gradient
    [J]. 2017, Editorial Board of Journal on Communications (38):
  • [7] Exploring Policy Diversity in Parallel Actor-Critic Learning
    Zhang, Yanqiang
    Zhai, Yuanzhao
    Zhou, Gongqian
    Ding, Bo
    Feng, Dawei
    Liu, Songwang
    [J]. 2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 1196 - 1203
  • [8] Generalized Offline Actor-Critic with Behavior Regularization
    Cheng, Yu-Hu
    Huang, Long-Yang
    Hou, Di-Yuan
    Zhang, Jia-Zhi
    Chen, Jun-Long
    Wang, Xue-Song
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (04): : 843 - 855
  • [9] Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms
    Laroche, Romain
    des Combes, Remi Tachet
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 5658 - 5688
  • [10] Efficient Model Learning Methods for Actor-Critic Control
    Grondman, Ivo
    Vaandrager, Maarten
    Busoniu, Lucian
    Babuska, Robert
    Schuitema, Erik
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2012, 42 (03): : 591 - 602