Proximal Policy Optimization with Advantage Reuse Competition

被引：1

作者：

Cheng Y. ^{[1
]}

Guo Q. ^{[1
]}

Wang X. ^{[1
]}

机构：

[1] Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, Xuzhou,221116, China

[2] China University of Mining and Technology, School of Information and Control Engineering, Xuzhou,221116, China

来源：

IEEE Transactions on Artificial Intelligence | 2024年 / 5卷 / 08期

关键词：

Advantage reuse competition; Artificial intelligence; Computational complexity; generalized clipping boundary; Linear programming; Markov processes; Optimization; policy update; proximal policy optimization; Task analysis; TV;

D O I：

10.1109/TAI.2024.3354694

中图分类号：

学科分类号：

摘要：

In recent years, reinforcement learning (RL) has made great achievements in artificial intelligence. Proximal policy optimization (PPO) is a representative RL algorithm, which limits the magnitude of each policy update to achieve monotonic policy improvement. However, as an on-policy algorithm, PPO suffers from sample inefficiency and poor policy exploratory. To solve above problems, the off-policy advantage is proposed, which calculates the advantage function through the reuse of previous policy, and the proximal policy optimization with advantage reuse (PPO-AR) is proposed. Furthermore, to improve the sampling efficiency of policy update, the proximal policy optimization with advantage reuse competition (PPO-ARC) is proposed, which introduces PPO-AR into the policy calculation and uses the parallel competitive optimization, and it is shown to improve the performance of policy. Moreover, to improve the exploratory of policy update, the proximal policy optimization with generalized clipping (PPO-GC) is proposed, which relaxes the limits of policy update by changing the policy flat clipping boundary. Experimental results on OpenAI Gym demonstrate the effectiveness of our proposed PPO-ARC and PPO-GC. IEEE

引用

页码：1 / 10

页数：9

共 50 条

[1] Generalized Proximal Policy Optimization with Sample Reuse
Queeney, James
Paschalidis, Ioannis Ch.
Cassandras, Christos G.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[2] Partial Advantage Estimator for Proximal Policy Optimization
Jin, Yizhao
Song, Xiulei
Slabaugh, Gregory
Lucas, Simon
IEEE TRANSACTIONS ON GAMES, 2025, 17 (01) : 158 - 166
[3] Upper confident bound advantage function proximal policy optimization
Guiliang Xie
Wei Zhang
Zhi Hu
Gaojian Li
Cluster Computing, 2023, 26 : 2001 - 2010
[4] Upper confident bound advantage function proximal policy optimization
Xie, Guiliang
Zhang, Wei
Hu, Zhi
Li, Gaojian
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2023, 26 (03): : 2001 - 2010
[5] Advantage Constrained Proximal Policy Optimization in Multi-Agent Reinforcement Learning
Li, Weifan
Zhu, Yuanheng
Zhao, Dongbin
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[6] Autonomous Valet Parking with Asynchronous Advantage Actor-Critic Proximal Policy Optimization
Tiong, Teckchai
Saad, Ismail
Teo, Kenneth Tze Kin
Bin Lago, Herwansyah
2022 IEEE 12TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2022, : 334 - 340
[7] Competitive Advantage and Competition Policy in Developing Countries
Mateus, Abel M.
WORLD COMPETITION, 2009, 32 (02): : 275 - +
[8] Proximal Policy Optimization With Policy Feedback
Gu, Yang
Cheng, Yuhu
Chen, C. L. Philip
Wang, Xuesong
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (07): : 4600 - 4610
[9] Coordinated Proximal Policy Optimization
Wu, Zifan
Yu, Chao
Ye, Deheng
Zhang, Junge
Piao, Haiyin
Zhuo, Hankz Hankui
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[10] Truly Proximal Policy Optimization
Wang, Yuhui
He, Hao
Tan, Xiaoyang
35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 113 - 122

← 1 2 3 4 5 →