Reinforcement learning with multimodal advantage function for accurate advantage estimation in robot learning

被引:1
|
作者
Park, Jonghyeok [1 ]
Han, Soohee [1 ,2 ]
机构
[1] Pohang Univ Sci & Technol, Convergence IT Engn, Cheongam Ro 77, Pohang Si 37673, Gyeongsangbuk D, South Korea
[2] Pohang Univ Sci & Technol, Elect Engn, Cheongam Ro 77, Pohang Si 37673, Gyeongsangbuk D, South Korea
基金
新加坡国家研究基金会;
关键词
Advantage function; Inverted pendulum; Reinforcement learning; Real-time control; Robotics;
D O I
10.1016/j.engappai.2023.107019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a reinforcement learning (RL) framework that uses a multimodal advantage function (MAF) to come close to the true advantage function, thereby achieving high returns. The MAF, which is constructed as a logarithm of a mixture of Gaussians policy (MoG-P) and trained by globally collected past experiences, directly assesses the complex true advantage function with its multi-modality and is expected to enhance the sample-efficiency of RL. To realize the expected enhanced learning performance with the proposed RL framework, two practical techniques are developed that include mode selection and rounding off of actions during the policy update process. Mode selection is conducted to sample the action around the most influential or weighted mode for efficient environment exploration. For fast policy updates, past actions are rounded off to discretized action values when calculating the multimodal advantage function. The proposed RL framework was validated using simulation environments and a real inverted pendulum system. The findings showed that the proposed framework can achieve a more sample-efficient performance or higher returns than other advantage-based RL benchmarks.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Bootstrap Advantage Estimation for Policy Optimization in Reinforcement Learning
    Rahman, Md Masudur
    Xue, Yexiang
    [J]. 2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 234 - 239
  • [2] HIERARCHICAL REINFORCEMENT LEARNING WITH ADVANTAGE FUNCTION FOR ENTITY RELATION EXTRACTION
    Zhu, Xianchao
    Zhu, William
    [J]. Journal of Applied and Numerical Optimization, 2022, 4 (03): : 393 - 404
  • [3] Navigation of Autonomous Vehicles using Reinforcement Learning with Generalized Advantage Estimation
    Jacinto, Edwar
    Martinez, Fernando
    Martinez, Fredy
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (01) : 954 - 959
  • [4] Reinforcement Learning for Continuous Control: A Quantum Normalized Advantage Function Approach
    Liu, Yaofu
    Xu, Chang
    Jin, Siyuan
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON QUANTUM SOFTWARE, QSW, 2023, : 83 - 87
  • [5] Improving Offline Reinforcement Learning With In-Sample Advantage Regularization for Robot Manipulation
    Ma, Chengzhong
    Yang, Deyu
    Wu, Tianyu
    Liu, Zeyang
    Yang, Houxue
    Chen, Xingyu
    Lan, Xuguang
    Zheng, Nanning
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [6] Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons
    Shi, Chengchun
    Luo, Shikai
    Le, Yuan
    Zhu, Hongtu
    Song, Rui
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (545) : 232 - 245
  • [7] Variational value learning in advantage actor-critic reinforcement learning
    Zhang, Yaozhong
    Han, Jiaqi
    Hu, Xiaofang
    Dan, Shihao
    [J]. 2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 1955 - 1960
  • [8] Hierarchical Advantage for Reinforcement Learning in Parameterized Action Space
    Hu, Zhejie
    Kaneko, Tomoyuki
    [J]. 2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 816 - 823
  • [9] Offline Meta-Reinforcement Learning with Advantage Weighting
    Mitchell, Eric
    Rafailov, Rafael
    Peng, Xue Bin
    Levine, Sergey
    Finn, Chelsea
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [10] Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards
    Li, Siyuan
    Wang, Rui
    Tang, Minxue
    Zhang, Chongjie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32