Proximal Policy Optimization with Advantage Reuse Competition

被引:1
|
作者
Cheng Y. [1 ]
Guo Q. [1 ]
Wang X. [1 ]
机构
[1] Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, Xuzhou,221116, China
[2] China University of Mining and Technology, School of Information and Control Engineering, Xuzhou,221116, China
来源
关键词
Advantage reuse competition; Artificial intelligence; Computational complexity; generalized clipping boundary; Linear programming; Markov processes; Optimization; policy update; proximal policy optimization; Task analysis; TV;
D O I
10.1109/TAI.2024.3354694
中图分类号
学科分类号
摘要
In recent years, reinforcement learning (RL) has made great achievements in artificial intelligence. Proximal policy optimization (PPO) is a representative RL algorithm, which limits the magnitude of each policy update to achieve monotonic policy improvement. However, as an on-policy algorithm, PPO suffers from sample inefficiency and poor policy exploratory. To solve above problems, the off-policy advantage is proposed, which calculates the advantage function through the reuse of previous policy, and the proximal policy optimization with advantage reuse (PPO-AR) is proposed. Furthermore, to improve the sampling efficiency of policy update, the proximal policy optimization with advantage reuse competition (PPO-ARC) is proposed, which introduces PPO-AR into the policy calculation and uses the parallel competitive optimization, and it is shown to improve the performance of policy. Moreover, to improve the exploratory of policy update, the proximal policy optimization with generalized clipping (PPO-GC) is proposed, which relaxes the limits of policy update by changing the policy flat clipping boundary. Experimental results on OpenAI Gym demonstrate the effectiveness of our proposed PPO-ARC and PPO-GC. IEEE
引用
收藏
页码:1 / 10
页数:9
相关论文
共 50 条
  • [41] Misleading Inference Generation via Proximal Policy Optimization
    Peng, Hsien-Yung
    Chung, Ho-Lam
    Chan, Ying-Hong
    Fan, Yao-Chung
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT I, 2022, 13280 : 497 - 509
  • [42] DNA: Proximal Policy Optimization with a Dual Network Architecture
    Aitchison, Matthew
    Sweetser, Penny
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [43] Trust Region-Guided Proximal Policy Optimization
    Wang, Yuhui
    He, Hao
    Tan, Xiaoyang
    Gan, Yaozhong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [44] Supply Chain Capacity Competition and Optimization Based on Online Transaction Advantage of Backwardness
    Li, Pei-Qin
    2016 2ND INTERNATIONAL CONFERENCE ON MODERN EDUCATION AND SOCIAL SCIENCE (MESS 2016), 2016, : 976 - 984
  • [45] Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy
    Liu, Boyi
    Cai, Qi
    Yang, Zhuoran
    Wang, Zhaoran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [46] Reactive Power Optimization Based on Proximal Policy Optimization of Deep Reinforcement Learning
    Zahng P.
    Zhu Z.
    Xie H.
    Dianwang Jishu/Power System Technology, 2023, 47 (02): : 562 - 570
  • [47] Proximal policy optimization-based join order optimization with spark SQL
    Lee K.-M.
    Kim I.
    Lee K.-C.
    Lee, Kyu-Chul (kclee@cnu.ac.kr), 1600, Institute of Electronics Engineers of Korea (10): : 227 - 232
  • [48] GREEN SIMULATION BASED POLICY OPTIMIZATION WITH PARTIAL HISTORICAL TRAJECTORY REUSE
    Zheng, Hua
    Xie, Wei
    2022 WINTER SIMULATION CONFERENCE (WSC), 2022, : 168 - 179
  • [49] Soft policy optimization using dual-track advantage estimator
    Huang, Yubo
    Wang, Xuechun
    Zou, Luobao
    Zhuang, Zhiwei
    Zhang, Weidong
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 1064 - 1069
  • [50] Competition Policy and the Competition Policy Review
    King, Stephen P.
    AUSTRALIAN ECONOMIC REVIEW, 2015, 48 (04) : 402 - 409