Proximal Policy Optimization with Entropy Regularization

被引:0
|
作者
Shen, Yuqing [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
reinforcement learning; policy gradient; entropy regularization;
D O I
10.1109/ICCCR61138.2024.10585473
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study provides a revision to the Proximal Policy Optimization (PPO) algorithm, primarily aimed at improving the stability of PPO during the training process while maintaining a balance between exploration and exploitation. Recognizing the inherent challenge of achieving this balance in a complex environment, the proposed method adopts an entropy regularization technique similar to the one used in the Asynchronous Advantage Actor-Critic (A3C) algorithm. The main purpose of this design is to encourage exploration in the early stages, preventing the agent from prematurely converging to a sub-optimal policy. Detailed theoretical explanations of how the entropy term improves the robustness of the learning trajectory will be provided. Experimental results demonstrate that the revised PPO not only maintains the original strengths of the PPO algorithm, but also shows significant improvement in the stability of the training process. This work contributes to the ongoing research in reinforcement learning and offers a promising direction for future research on the adoption of PPO in environments with complicated dynamics.
引用
收藏
页码:380 / 383
页数:4
相关论文
共 50 条
  • [31] Generalized Proximal Policy Optimization with Sample Reuse
    Queeney, James
    Paschalidis, Ioannis Ch.
    Cassandras, Christos G.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [32] Proximal Policy Optimization with Advantage Reuse Competition
    Cheng Y.
    Guo Q.
    Wang X.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (08): : 1 - 10
  • [33] Decaying Clipping Range in Proximal Policy Optimization
    Farsang, Monika
    Szegletes, Luca
    IEEE 15TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS (SACI 2021), 2021, : 521 - 525
  • [34] Proximal Policy Optimization for Radiation Source Search
    Proctor, Philippe
    Teuscher, Christof
    Hecht, Adam
    Osinski, Marek
    JOURNAL OF NUCLEAR ENGINEERING, 2021, 2 (04): : 368 - 397
  • [35] Proximal Denoiser for Convergent Plug-and-Play Optimization with Nonconvex Regularization
    Hurault, Samuel
    Leclaire, Arthur
    Papadakis, Nicolas
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [36] An Effective Optimization Method for Fuzzy k-Means With Entropy Regularization
    Liang, Yun
    Chen, Yijin
    Huang, Qiong
    Chen, Haoming
    Nie, Feiping
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (07) : 2846 - 2861
  • [37] Learning Dialogue Policy Efficiently Through Dyna Proximal Policy Optimization
    Huang, Chenping
    Cao, Bin
    COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT I, 2022, 460 : 396 - 414
  • [38] Model-Based Imitation Learning Using Entropy Regularization of Model and Policy
    Uchibe, Eiji
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 10922 - 10929
  • [39] Pairs Trading Strategy Optimization Using Proximal Policy Optimization Algorithms
    Chen, Yi-Feng
    Shih, Wen-Yueh
    Lai, Hsu -Chao
    Chang, Hao-Chun
    Huang, Jiun-Long
    2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 40 - 47
  • [40] Proximal Policy Optimization-Based Optimization of Microwave Planar Resonators
    Pan, Jia-Hao
    Liu, Qi Qiang
    Zhao, Wen-Sheng
    Hu, Xiaoping
    You, Bin
    Hu, Yue
    Wang, Jing
    Yu, Chenghao
    Wang, Da-Wei
    IEEE TRANSACTIONS ON COMPONENTS PACKAGING AND MANUFACTURING TECHNOLOGY, 2024, 14 (12): : 2339 - 2347