Proximal Policy Optimization with Advantage Reuse Competition

被引:1
|
作者
Cheng Y. [1 ]
Guo Q. [1 ]
Wang X. [1 ]
机构
[1] Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, Xuzhou,221116, China
[2] China University of Mining and Technology, School of Information and Control Engineering, Xuzhou,221116, China
来源
关键词
Advantage reuse competition; Artificial intelligence; Computational complexity; generalized clipping boundary; Linear programming; Markov processes; Optimization; policy update; proximal policy optimization; Task analysis; TV;
D O I
10.1109/TAI.2024.3354694
中图分类号
学科分类号
摘要
In recent years, reinforcement learning (RL) has made great achievements in artificial intelligence. Proximal policy optimization (PPO) is a representative RL algorithm, which limits the magnitude of each policy update to achieve monotonic policy improvement. However, as an on-policy algorithm, PPO suffers from sample inefficiency and poor policy exploratory. To solve above problems, the off-policy advantage is proposed, which calculates the advantage function through the reuse of previous policy, and the proximal policy optimization with advantage reuse (PPO-AR) is proposed. Furthermore, to improve the sampling efficiency of policy update, the proximal policy optimization with advantage reuse competition (PPO-ARC) is proposed, which introduces PPO-AR into the policy calculation and uses the parallel competitive optimization, and it is shown to improve the performance of policy. Moreover, to improve the exploratory of policy update, the proximal policy optimization with generalized clipping (PPO-GC) is proposed, which relaxes the limits of policy update by changing the policy flat clipping boundary. Experimental results on OpenAI Gym demonstrate the effectiveness of our proposed PPO-ARC and PPO-GC. IEEE
引用
收藏
页码:1 / 10
页数:9
相关论文
共 50 条
  • [1] Generalized Proximal Policy Optimization with Sample Reuse
    Queeney, James
    Paschalidis, Ioannis Ch.
    Cassandras, Christos G.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Partial Advantage Estimator for Proximal Policy Optimization
    Jin, Yizhao
    Song, Xiulei
    Slabaugh, Gregory
    Lucas, Simon
    IEEE TRANSACTIONS ON GAMES, 2025, 17 (01) : 158 - 166
  • [3] Upper confident bound advantage function proximal policy optimization
    Guiliang Xie
    Wei Zhang
    Zhi Hu
    Gaojian Li
    Cluster Computing, 2023, 26 : 2001 - 2010
  • [4] Upper confident bound advantage function proximal policy optimization
    Xie, Guiliang
    Zhang, Wei
    Hu, Zhi
    Li, Gaojian
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2023, 26 (03): : 2001 - 2010
  • [5] Advantage Constrained Proximal Policy Optimization in Multi-Agent Reinforcement Learning
    Li, Weifan
    Zhu, Yuanheng
    Zhao, Dongbin
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [6] Autonomous Valet Parking with Asynchronous Advantage Actor-Critic Proximal Policy Optimization
    Tiong, Teckchai
    Saad, Ismail
    Teo, Kenneth Tze Kin
    Bin Lago, Herwansyah
    2022 IEEE 12TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2022, : 334 - 340
  • [7] Competitive Advantage and Competition Policy in Developing Countries
    Mateus, Abel M.
    WORLD COMPETITION, 2009, 32 (02): : 275 - +
  • [8] Proximal Policy Optimization With Policy Feedback
    Gu, Yang
    Cheng, Yuhu
    Chen, C. L. Philip
    Wang, Xuesong
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (07): : 4600 - 4610
  • [9] Coordinated Proximal Policy Optimization
    Wu, Zifan
    Yu, Chao
    Ye, Deheng
    Zhang, Junge
    Piao, Haiyin
    Zhuo, Hankz Hankui
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [10] Truly Proximal Policy Optimization
    Wang, Yuhui
    He, Hao
    Tan, Xiaoyang
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 113 - 122