Proximal Policy Optimization with Advantage Reuse Competition

被引：1

作者：

Cheng Y. ^{[1
]}

Guo Q. ^{[1
]}

Wang X. ^{[1
]}

机构：

[1] Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, Xuzhou,221116, China

[2] China University of Mining and Technology, School of Information and Control Engineering, Xuzhou,221116, China

来源：

IEEE Transactions on Artificial Intelligence | 2024年 / 5卷 / 08期

关键词：

Advantage reuse competition; Artificial intelligence; Computational complexity; generalized clipping boundary; Linear programming; Markov processes; Optimization; policy update; proximal policy optimization; Task analysis; TV;

D O I：

10.1109/TAI.2024.3354694

中图分类号：

学科分类号：

摘要：

In recent years, reinforcement learning (RL) has made great achievements in artificial intelligence. Proximal policy optimization (PPO) is a representative RL algorithm, which limits the magnitude of each policy update to achieve monotonic policy improvement. However, as an on-policy algorithm, PPO suffers from sample inefficiency and poor policy exploratory. To solve above problems, the off-policy advantage is proposed, which calculates the advantage function through the reuse of previous policy, and the proximal policy optimization with advantage reuse (PPO-AR) is proposed. Furthermore, to improve the sampling efficiency of policy update, the proximal policy optimization with advantage reuse competition (PPO-ARC) is proposed, which introduces PPO-AR into the policy calculation and uses the parallel competitive optimization, and it is shown to improve the performance of policy. Moreover, to improve the exploratory of policy update, the proximal policy optimization with generalized clipping (PPO-GC) is proposed, which relaxes the limits of policy update by changing the policy flat clipping boundary. Experimental results on OpenAI Gym demonstrate the effectiveness of our proposed PPO-ARC and PPO-GC. IEEE

引用

页码：1 / 10

页数：9

共 50 条

[31] Proximal policy optimization for formation navigation and obstacle avoidance
Sadhukhan, Priyam
Selmic, Rastko R.
INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND APPLICATIONS, 2022, 6 (04) : 746 - 759
[32] A novel guidance law based on proximal policy optimization
Jiang, Yang
Yu, Jianglong
Li, Qingdong
Ren, Zhang
Done, Xiwang
Hua, Yongzhao
2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 3364 - 3369
[33] Augmented Proximal Policy Optimization for Safe Reinforcement Learning
Dai, Juntao
Ji, Jiaming
Yang, Long
Zheng, Qian
Pan, Gang
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7288 - 7295
[34] Proximal policy optimization with model-based methods
Li, Shuailong
Zhang, Wei
Zhang, Huiwen
Zhang, Xin
Leng, Yuquan
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (06) : 5399 - 5410
[35] A Novel Proximal Policy Optimization Approach for Filter Design
Fan, Dongdong
Ding, Shuai
Zhang, Haotian
Zhang, Weihao
Jia, Qingsong
Han, Xu
Tang, Hao
Zhu, Zhaojun
Zhou, Yuliang
APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL, 2024, 39 (05): : 390 - 395
[36] Proximal policy optimization for formation navigation and obstacle avoidance
Priyam Sadhukhan
Rastko R. Selmic
International Journal of Intelligent Robotics and Applications, 2022, 6 : 746 - 759
[37] Proximal policy optimization with an integral compensator for quadrotor control
Hu, Huan
Wang, Qing-ling
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (05) : 777 - 795
[38] Proximal policy optimization via enhanced exploration efficiency
Zhang, Junwei
Zhang, Zhenghao
Han, Shuai
Lue, Shuai
INFORMATION SCIENCES, 2022, 609 : 750 - 765
[39] Use of Proximal Policy Optimization for the Joint Replenishment Problem
Vanvuchelen, Nathalie
Gijsbrechts, Joren
Boute, Robert
COMPUTERS IN INDUSTRY, 2020, 119
[40] Proximal policy optimization with an integral compensator for quadrotor control
Huan Hu
Qing-ling Wang
Frontiers of Information Technology & Electronic Engineering, 2020, 21 : 777 - 795

← 1 2 3 4 5 →