Proximal Policy Optimization with Entropy Regularization

被引:0
|
作者
Shen, Yuqing [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
reinforcement learning; policy gradient; entropy regularization;
D O I
10.1109/ICCCR61138.2024.10585473
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study provides a revision to the Proximal Policy Optimization (PPO) algorithm, primarily aimed at improving the stability of PPO during the training process while maintaining a balance between exploration and exploitation. Recognizing the inherent challenge of achieving this balance in a complex environment, the proposed method adopts an entropy regularization technique similar to the one used in the Asynchronous Advantage Actor-Critic (A3C) algorithm. The main purpose of this design is to encourage exploration in the early stages, preventing the agent from prematurely converging to a sub-optimal policy. Detailed theoretical explanations of how the entropy term improves the robustness of the learning trajectory will be provided. Experimental results demonstrate that the revised PPO not only maintains the original strengths of the PPO algorithm, but also shows significant improvement in the stability of the training process. This work contributes to the ongoing research in reinforcement learning and offers a promising direction for future research on the adoption of PPO in environments with complicated dynamics.
引用
收藏
页码:380 / 383
页数:4
相关论文
共 50 条
  • [41] Inertial proximal point regularization algorithm for unconstrained vector convex optimization problems
    Nguyen Buong
    Ukrainian Mathematical Journal, 2008, 60 : 1483 - 1491
  • [42] A PROXIMAL POINT ALGORITHM FOR LOG-DETERMINANT OPTIMIZATION WITH GROUP LASSO REGULARIZATION
    Yang, Junfeng
    Sun, Defeng
    Toh, Kim-Chuan
    SIAM JOURNAL ON OPTIMIZATION, 2013, 23 (02) : 857 - 893
  • [43] Inertial proximal point regularization algorithm for unconstrained vector convex optimization problems
    Nguyen Buong
    UKRAINIAN MATHEMATICAL JOURNAL, 2008, 60 (09) : 1483 - 1491
  • [44] Regularization of currents and entropy
    Dinh, TC
    Sibony, N
    ANNALES SCIENTIFIQUES DE L ECOLE NORMALE SUPERIEURE, 2004, 37 (06): : 959 - 971
  • [45] Quantum entropy regularization
    Silver, RN
    MAXIMUM ENTROPY AND BAYESIAN METHODS, 1999, 105 : 91 - 98
  • [46] Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems
    Wu, Zhongming
    Li, Chongshou
    Li, Min
    Lim, Andrew
    JOURNAL OF GLOBAL OPTIMIZATION, 2021, 79 (03) : 617 - 644
  • [47] Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems
    Zhongming Wu
    Chongshou Li
    Min Li
    Andrew Lim
    Journal of Global Optimization, 2021, 79 : 617 - 644
  • [48] HiPPO: Enhancing proximal policy optimization with highlight replay
    Zhang, Shutong
    Chen, Xing
    Liu, Zhaogeng
    Chen, Hechang
    Chang, Yi
    PATTERN RECOGNITION, 2025, 162
  • [49] Proximal policy optimization for formation navigation and obstacle avoidance
    Sadhukhan, Priyam
    Selmic, Rastko R.
    INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND APPLICATIONS, 2022, 6 (04) : 746 - 759
  • [50] A novel guidance law based on proximal policy optimization
    Jiang, Yang
    Yu, Jianglong
    Li, Qingdong
    Ren, Zhang
    Done, Xiwang
    Hua, Yongzhao
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 3364 - 3369