Reinforcement learning with multimodal advantage function for accurate advantage estimation in robot learning

被引：1

作者：

Park, Jonghyeok ^{[1
]}

Han, Soohee ^{[1
,2
]}

机构：

[1] Pohang Univ Sci & Technol, Convergence IT Engn, Cheongam Ro 77, Pohang Si 37673, Gyeongsangbuk D, South Korea

[2] Pohang Univ Sci & Technol, Elect Engn, Cheongam Ro 77, Pohang Si 37673, Gyeongsangbuk D, South Korea

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2023年 / 126卷

基金：

新加坡国家研究基金会;

关键词：

Advantage function; Inverted pendulum; Reinforcement learning; Real-time control; Robotics;

D O I：

10.1016/j.engappai.2023.107019

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a reinforcement learning (RL) framework that uses a multimodal advantage function (MAF) to come close to the true advantage function, thereby achieving high returns. The MAF, which is constructed as a logarithm of a mixture of Gaussians policy (MoG-P) and trained by globally collected past experiences, directly assesses the complex true advantage function with its multi-modality and is expected to enhance the sample-efficiency of RL. To realize the expected enhanced learning performance with the proposed RL framework, two practical techniques are developed that include mode selection and rounding off of actions during the policy update process. Mode selection is conducted to sample the action around the most influential or weighted mode for efficient environment exploration. For fast policy updates, past actions are rounded off to discretized action values when calculating the multimodal advantage function. The proposed RL framework was validated using simulation environments and a real inverted pendulum system. The findings showed that the proposed framework can achieve a more sample-efficient performance or higher returns than other advantage-based RL benchmarks.

引用

页数：11

共 50 条

[1] Bootstrap Advantage Estimation for Policy Optimization in Reinforcement Learning
Rahman, Md Masudur
Xue, Yexiang
[J]. 2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 234 - 239
[2] HIERARCHICAL REINFORCEMENT LEARNING WITH ADVANTAGE FUNCTION FOR ENTITY RELATION EXTRACTION
Zhu, Xianchao
Zhu, William
[J]. Journal of Applied and Numerical Optimization, 2022, 4 (03): : 393 - 404
[3] Navigation of Autonomous Vehicles using Reinforcement Learning with Generalized Advantage Estimation
Jacinto, Edwar
Martinez, Fernando
Martinez, Fredy
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (01) : 954 - 959
[4] Reinforcement Learning for Continuous Control: A Quantum Normalized Advantage Function Approach
Liu, Yaofu
Xu, Chang
Jin, Siyuan
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON QUANTUM SOFTWARE, QSW, 2023, : 83 - 87
[5] Improving Offline Reinforcement Learning With In-Sample Advantage Regularization for Robot Manipulation
Ma, Chengzhong
Yang, Deyu
Wu, Tianyu
Liu, Zeyang
Yang, Houxue
Chen, Xingyu
Lan, Xuguang
Zheng, Nanning
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[6] Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons
Shi, Chengchun
Luo, Shikai
Le, Yuan
Zhu, Hongtu
Song, Rui
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (545) : 232 - 245
[7] Variational value learning in advantage actor-critic reinforcement learning
Zhang, Yaozhong
Han, Jiaqi
Hu, Xiaofang
Dan, Shihao
[J]. 2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 1955 - 1960
[8] Hierarchical Advantage for Reinforcement Learning in Parameterized Action Space
Hu, Zhejie
Kaneko, Tomoyuki
[J]. 2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 816 - 823
[9] Offline Meta-Reinforcement Learning with Advantage Weighting
Mitchell, Eric
Rafailov, Rafael
Peng, Xue Bin
Levine, Sergey
Finn, Chelsea
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[10] Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards
Li, Siyuan
Wang, Rui
Tang, Minxue
Zhang, Chongjie
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32

← 1 2 3 4 5 →