Combined Optimization and Reinforcement Learning for Manipulation Skills

被引：0

作者：

Englert, Peter ^{[1
]}

Toussaint, Marc ^{[1
]}

机构：

[1] Univ Stuttgart, Machine Learning & Robot Lab, Stuttgart, Germany

来源：

ROBOTICS: SCIENCE AND SYSTEMS XII | 2016年

关键词：

D O I：

暂无

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

This work addresses the problem of how a robot can improve a manipulation skill in a sample-efficient and secure manner. As an alternative to the standard reinforcement learning formulation where all objectives are defined in a single reward function, we propose a generalized formulation that consists of three components: 1) A known analytic control cost function; 2) A black-box return function; and 3) A black-box binary success constraint. While the overall policy optimization problem is high-dimensional, in typical robot manipulation problems we can assume that the black-box return and constraint only depend on a lower-dimensional projection of the solution. With our formulation we can exploit this structure for a sample-efficient learning framework that iteratively improves the policy with respect to the objective functions under the success constraint. We employ efficient 2nd-order optimization methods to optimize the high-dimensional policy w.r.t. the analytic cost function while keeping the lower dimensional projection fixed. This is alternated with safe Bayesian optimization over the lower-dimensional projection to address the black-box return and success constraint. During both improvement steps the success constraint is used to keep the optimization in a secure region and to clearly distinguish between motions that lead to success or failure. The learning algorithm is evaluated on a simulated benchmark problem and a door opening task with a PR2.

引用

页数：9

共 50 条

[1] Learning Basketball Dribbling Skills Using Trajectory Optimization and Deep Reinforcement Learning
Liu, Libin
Hodgins, Jessica
[J]. ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
[2] Supervised Meta-Reinforcement Learning With Trajectory Optimization for Manipulation Tasks
Wang, Lei
Zhang, Yunzhou
Zhu, Delong
Coleman, Sonya
Kerr, Dermot
[J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (02) : 681 - 691
[3] Hierarchical Reinforcement Learning and Central Pattern Generators for Modeling the Development of Rhythmic Manipulation Skills
Ciancio, Anna Lisa
Zollo, Loredana
Guglielmelli, Eugenio
Caligiore, Daniele
Baldassarre, Gianluca
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING (ICDL), 2011,
[4] Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation
Gu, Shangding
Sel, Bilgehan
Ding, Yuhao
Wang, Lu
Lin, Qingwei
Jin, Ming
Knoll, Alois
[J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21099 - 21106
[5] Task-Driven Reinforcement Learning With Action Primitives for Long-Horizon Manipulation Skills
Wang, Hao
Zhang, Hao
Li, Lin
Kan, Zhen
Song, Yongduan
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (08) : 4513 - 4526
[6] Geometric Reinforcement Learning for Robotic Manipulation
Alhousani, Naseem
Saveriano, Matteo
Sevinc, Ibrahim
Abdulkuddus, Talha
Kose, Hatice
Abu-Dakka, Fares J.
[J]. IEEE ACCESS, 2023, 11 : 111492 - 111505
[7] Adaptive Optimization of Hyper-Parameters for Robotic Manipulation through Evolutionary Reinforcement Learning
Onori, Giulio
Shahid, Asad Ali
Braghin, Francesco
Roveda, Loris
[J]. JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2024, 110 (03)
[8] Learning Mobile Manipulation through Deep Reinforcement Learning
Wang, Cong
Zhang, Qifeng
Tian, Qiyan
Li, Shuo
Wang, Xiaohui
Lane, David
Petillot, Yvan
Wang, Sen
[J]. SENSORS, 2020, 20 (03)
[9] Evaluating skills in hierarchical reinforcement learning
Marzieh Davoodabadi Farahani
Nasser Mozayani
[J]. International Journal of Machine Learning and Cybernetics, 2020, 11 : 2407 - 2420
[10] Evaluating skills in hierarchical reinforcement learning
Farahani, Marzieh Davoodabadi
Mozayani, Nasser
[J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (10) : 2407 - 2420

← 1 2 3 4 5 →