Actor-Critic Learning Control With Regularization and Feature Selection in Policy Gradient Estimation

被引：15

作者：

Li, Luntong ^{[1
]}

Li, Dazi ^{[1
]}

Song, Tianheng ^{[1
]}

Xu, Xin ^{[2
]}

机构：

[1] Beijing Univ Chem Technol, Dept Automat, Beijing 100029, Peoples R China

[2] Natl Univ Def Technol, Coll Intelligence Sci & Technol, Changsha 410073, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2021年 / 32卷 / 03期

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

l(1)-regularization; actor-critic (AC); policy gradient; regularized dual-averaging (RDA); reinforcement learning (RL);

D O I：

10.1109/TNNLS.2020.2981377

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Actor-critic (AC) learning control architecture has been regarded as an important framework for reinforcement learning (RL) with continuous states and actions. In order to improve learning efficiency and convergence property, previous works have been mainly devoted to solve regularization and feature learning problem in the policy evaluation. In this article, we propose a novel AC learning control method with regularization and feature selection for policy gradient estimation in the actor network. The main contribution is that l(1)-regularization is used on the actor network to achieve the function of feature selection. In each iteration, policy parameters are updated by the regularized dual-averaging (RDA) technique, which solves a minimization problem that involves two terms: one is the running average of the past policy gradients and the other is the l(1)-regularization term of policy parameters. Our algorithm can efficiently calculate the solution of the minimization problem, and we call the new adaptation of policy gradient RDApolicy gradient (RDA-PG). The proposed RDA-PG can learn stochastic and deterministic near-optimal policies. The convergence of the proposed algorithm is established based on the theory of two-timescale stochastic approximation. The simulation and experimental results show that RDA-PG performs feature selection successfully in the actor and learns sparse representations of the actor both in stochastic and deterministic cases. RDA-PG performs better than existing AC algorithms on standard RL benchmark problems with irrelevant features or redundant features.

引用

页码：1217 / 1227

页数：11

共 50 条

[1] Bayesian Policy Gradient and Actor-Critic Algorithms
Ghavamzadeh, Mohammad
Engel, Yaakov
Valko, Michal
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[2] Characterizing the Gap Between Actor-Critic and Policy Gradient
Wen, Junfeng
Kumar, Saurabh
Gummadi, Ramki
Schuurmans, Dale
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[3] Policy-Gradient Based Actor-Critic Algorithms
Awate, Yogesh P.
[J]. PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL III, 2009, : 505 - 509
[4] Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
Jia, Yanwei
Zhou, Xun Yu
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[5] Soft-Robust Actor-Critic Policy-Gradient
Derman, Esther
Mankowitz, Daniel J.
Mann, Timothy A.
Mannor, Shie
[J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 208 - 218
[6] Actor-critic algorithm with incremental dual natural policy gradient
[J]. 2017, Editorial Board of Journal on Communications (38):
[7] Exploring Policy Diversity in Parallel Actor-Critic Learning
Zhang, Yanqiang
Zhai, Yuanzhao
Zhou, Gongqian
Ding, Bo
Feng, Dawei
Liu, Songwang
[J]. 2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 1196 - 1203
[8] Generalized Offline Actor-Critic with Behavior Regularization
Cheng, Yu-Hu
Huang, Long-Yang
Hou, Di-Yuan
Zhang, Jia-Zhi
Chen, Jun-Long
Wang, Xue-Song
[J]. Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (04): : 843 - 855
[9] Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms
Laroche, Romain
des Combes, Remi Tachet
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 5658 - 5688
[10] Efficient Model Learning Methods for Actor-Critic Control
Grondman, Ivo
Vaandrager, Maarten
Busoniu, Lucian
Babuska, Robert
Schuitema, Erik
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2012, 42 (03): : 591 - 602

← 1 2 3 4 5 →