Proximal Policy Optimization with Advantage Reuse Competition

被引:1
|
作者
Cheng Y. [1 ]
Guo Q. [1 ]
Wang X. [1 ]
机构
[1] Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, Xuzhou,221116, China
[2] China University of Mining and Technology, School of Information and Control Engineering, Xuzhou,221116, China
来源
关键词
Advantage reuse competition; Artificial intelligence; Computational complexity; generalized clipping boundary; Linear programming; Markov processes; Optimization; policy update; proximal policy optimization; Task analysis; TV;
D O I
10.1109/TAI.2024.3354694
中图分类号
学科分类号
摘要
In recent years, reinforcement learning (RL) has made great achievements in artificial intelligence. Proximal policy optimization (PPO) is a representative RL algorithm, which limits the magnitude of each policy update to achieve monotonic policy improvement. However, as an on-policy algorithm, PPO suffers from sample inefficiency and poor policy exploratory. To solve above problems, the off-policy advantage is proposed, which calculates the advantage function through the reuse of previous policy, and the proximal policy optimization with advantage reuse (PPO-AR) is proposed. Furthermore, to improve the sampling efficiency of policy update, the proximal policy optimization with advantage reuse competition (PPO-ARC) is proposed, which introduces PPO-AR into the policy calculation and uses the parallel competitive optimization, and it is shown to improve the performance of policy. Moreover, to improve the exploratory of policy update, the proximal policy optimization with generalized clipping (PPO-GC) is proposed, which relaxes the limits of policy update by changing the policy flat clipping boundary. Experimental results on OpenAI Gym demonstrate the effectiveness of our proposed PPO-ARC and PPO-GC. IEEE
引用
收藏
页码:1 / 10
页数:9
相关论文
共 50 条
  • [31] Proximal policy optimization for formation navigation and obstacle avoidance
    Sadhukhan, Priyam
    Selmic, Rastko R.
    INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND APPLICATIONS, 2022, 6 (04) : 746 - 759
  • [32] A novel guidance law based on proximal policy optimization
    Jiang, Yang
    Yu, Jianglong
    Li, Qingdong
    Ren, Zhang
    Done, Xiwang
    Hua, Yongzhao
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 3364 - 3369
  • [33] Augmented Proximal Policy Optimization for Safe Reinforcement Learning
    Dai, Juntao
    Ji, Jiaming
    Yang, Long
    Zheng, Qian
    Pan, Gang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7288 - 7295
  • [34] Proximal policy optimization with model-based methods
    Li, Shuailong
    Zhang, Wei
    Zhang, Huiwen
    Zhang, Xin
    Leng, Yuquan
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (06) : 5399 - 5410
  • [35] A Novel Proximal Policy Optimization Approach for Filter Design
    Fan, Dongdong
    Ding, Shuai
    Zhang, Haotian
    Zhang, Weihao
    Jia, Qingsong
    Han, Xu
    Tang, Hao
    Zhu, Zhaojun
    Zhou, Yuliang
    APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL, 2024, 39 (05): : 390 - 395
  • [36] Proximal policy optimization for formation navigation and obstacle avoidance
    Priyam Sadhukhan
    Rastko R. Selmic
    International Journal of Intelligent Robotics and Applications, 2022, 6 : 746 - 759
  • [37] Proximal policy optimization with an integral compensator for quadrotor control
    Hu, Huan
    Wang, Qing-ling
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (05) : 777 - 795
  • [38] Proximal policy optimization via enhanced exploration efficiency
    Zhang, Junwei
    Zhang, Zhenghao
    Han, Shuai
    Lue, Shuai
    INFORMATION SCIENCES, 2022, 609 : 750 - 765
  • [39] Use of Proximal Policy Optimization for the Joint Replenishment Problem
    Vanvuchelen, Nathalie
    Gijsbrechts, Joren
    Boute, Robert
    COMPUTERS IN INDUSTRY, 2020, 119
  • [40] Proximal policy optimization with an integral compensator for quadrotor control
    Huan Hu
    Qing-ling Wang
    Frontiers of Information Technology & Electronic Engineering, 2020, 21 : 777 - 795