Proximal Policy Optimization with Entropy Regularization

被引：0

作者：

Shen, Yuqing ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024 | 2024年

关键词：

reinforcement learning; policy gradient; entropy regularization;

D O I：

10.1109/ICCCR61138.2024.10585473

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This study provides a revision to the Proximal Policy Optimization (PPO) algorithm, primarily aimed at improving the stability of PPO during the training process while maintaining a balance between exploration and exploitation. Recognizing the inherent challenge of achieving this balance in a complex environment, the proposed method adopts an entropy regularization technique similar to the one used in the Asynchronous Advantage Actor-Critic (A3C) algorithm. The main purpose of this design is to encourage exploration in the early stages, preventing the agent from prematurely converging to a sub-optimal policy. Detailed theoretical explanations of how the entropy term improves the robustness of the learning trajectory will be provided. Experimental results demonstrate that the revised PPO not only maintains the original strengths of the PPO algorithm, but also shows significant improvement in the stability of the training process. This work contributes to the ongoing research in reinforcement learning and offers a promising direction for future research on the adoption of PPO in environments with complicated dynamics.

引用

页码：380 / 383

页数：4

共 50 条

[31] Generalized Proximal Policy Optimization with Sample Reuse
Queeney, James
Paschalidis, Ioannis Ch.
Cassandras, Christos G.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[32] Proximal Policy Optimization with Advantage Reuse Competition
Cheng Y.
Guo Q.
Wang X.
IEEE Transactions on Artificial Intelligence, 2024, 5 (08): : 1 - 10
[33] Decaying Clipping Range in Proximal Policy Optimization
Farsang, Monika
Szegletes, Luca
IEEE 15TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS (SACI 2021), 2021, : 521 - 525
[34] Proximal Policy Optimization for Radiation Source Search
Proctor, Philippe
Teuscher, Christof
Hecht, Adam
Osinski, Marek
JOURNAL OF NUCLEAR ENGINEERING, 2021, 2 (04): : 368 - 397
[35] Proximal Denoiser for Convergent Plug-and-Play Optimization with Nonconvex Regularization
Hurault, Samuel
Leclaire, Arthur
Papadakis, Nicolas
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[36] An Effective Optimization Method for Fuzzy k-Means With Entropy Regularization
Liang, Yun
Chen, Yijin
Huang, Qiong
Chen, Haoming
Nie, Feiping
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (07) : 2846 - 2861
[37] Learning Dialogue Policy Efficiently Through Dyna Proximal Policy Optimization
Huang, Chenping
Cao, Bin
COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT I, 2022, 460 : 396 - 414
[38] Model-Based Imitation Learning Using Entropy Regularization of Model and Policy
Uchibe, Eiji
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 10922 - 10929
[39] Pairs Trading Strategy Optimization Using Proximal Policy Optimization Algorithms
Chen, Yi-Feng
Shih, Wen-Yueh
Lai, Hsu -Chao
Chang, Hao-Chun
Huang, Jiun-Long
2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 40 - 47
[40] Proximal Policy Optimization-Based Optimization of Microwave Planar Resonators
Pan, Jia-Hao
Liu, Qi Qiang
Zhao, Wen-Sheng
Hu, Xiaoping
You, Bin
Hu, Yue
Wang, Jing
Yu, Chenghao
Wang, Da-Wei
IEEE TRANSACTIONS ON COMPONENTS PACKAGING AND MANUFACTURING TECHNOLOGY, 2024, 14 (12): : 2339 - 2347

← 1 2 3 4 5 →