Supervised Meta-Reinforcement Learning With Trajectory Optimization for Manipulation Tasks

被引：0

作者：

Wang, Lei ^{[1
]}

Zhang, Yunzhou ^{[2
]}

Zhu, Delong ^{[3
]}

Coleman, Sonya ^{[4
]}

Kerr, Dermot ^{[4
]}

机构：

[1] Northeastern Univ, Fac Robot Sci & Engn, Shenyang 110000, Peoples R China

[2] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110819, Peoples R China

[3] Chinese Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China

[4] Univ Ulster, Sch Comp Engn & Intelligent Syst, Coleraine BT52 1SA, North Ireland

来源：

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS | 2024年 / 16卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Trajectory optimization; Robots; Heuristic algorithms; Training; Complexity theory; Dynamical systems; Iterative LQR (iLQR); meta learning; reinforcement learning (RL); robotic manipulation; trajectory optimization; LEVEL; GO;

D O I：

10.1109/TCDS.2023.3286465

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning from small amounts of samples with reinforcement learning (RL) is challenging in many tasks, especially, in real-world applications, such as robotics. Meta-RL (meta-RL) has been proposed as an approach to address this problem by generalizing to new tasks through experience from previous similar tasks. However, these approaches generally perform meta-optimization by focusing direct policy search methods on validation samples from adapted policies, thus, requiring large amounts of on-policy samples during meta-training. To this end, we propose a novel algorithm called supervised meta-RL with trajectory optimization (SMRL-TO) by integrating model-agnostic meta-learning (MAML) and iterative LQR (iLQR)-based trajectory optimization. Our approach is designed to provide online supervision for validation samples through iLQR-based trajectory optimization and embed simple imitation learning into the meta-optimization rather than policy gradient steps. This is actually a bi-level optimization that needs to calculate several gradient updates in each meta-iteration, consisting of off-policy RL in the inner loop and online imitation learning in the outer loop. SMRL-TO can achieve significant improvements in sample efficiency without human-provided demonstrations, due to the effective supervision from iLQR-based trajectory optimization. In this article, we describe how to use iLQR-based trajectory optimization to obtain labeled data and then how leverage them to assist the training of meta-learner. Through a series of robotic manipulation tasks, we further show that compared with the previous methods, the proposed approach can substantially improve sample efficiency and achieve better asymptotic performance.

引用

页码：681 / 691

页数：11

共 50 条

[1] Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks
Schoettler, Gerrit
Nair, Ashvin
Ojea, Juan Aparicio
Levine, Sergey
Solowjow, Eugen
[J]. 2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 9728 - 9735
[2] Meta-Reinforcement Learning for Trajectory Design in Wireless UAV Networks
Hu, Ye
Chen, Mingzhe
Saad, Walid
Poor, H. Vincent
Cui, Shuguang
[J]. 2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
[3] Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning
Wang, Mingyang
Bing, Zhenshan
Yao, Xiangtong
Wang, Shuai
Kai, Huang
Su, Hang
Yang, Chenguang
Knoll, Alois
[J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 10157 - 10165
[4] Image quality assessment for machine learning tasks using meta-reinforcement learning
Saeed, Shaheer U.
Fu, Yunguan
Stavrinides, Vasilis
Baum, Zachary M.C.
Yang, Qianye
Rusu, Mirabela
Fan, Richard E.
Sonn, Geoffrey A.
Noble, J. Alison
Barratt, Dean C.
Hu, Yipeng
[J]. Medical Image Analysis, 2022, 78
[5] Multiagent Meta-Reinforcement Learning for Adaptive Multipath Routing Optimization
Chen, Long
Hu, Bin
Guan, Zhi-Hong
Zhao, Lian
Shen, Xuemin
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (10) : 5374 - 5386
[6] Solving Challenging Dexterous Manipulation Tasks With Trajectory Optimisation and Reinforcement Learning
Charlesworth, Henry
Montana, Giovanni
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[7] Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning
Mu, Yao
Zhuang, Yuzheng
Ni, Fei
Wang, Bin
Chen, Jianyu
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[8] Parameterizing Non-Parametric Meta-Reinforcement Learning Tasks via Subtask Decomposition
Lee, Suyoung
Cho, Myungsik
Sung, Youngchul
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[9] Robust interplanetary trajectory design under multiple uncertainties via meta-reinforcement learning
Federici, Lorenzo
Zavoli, Alessandro
[J]. ACTA ASTRONAUTICA, 2024, 214 : 147 - 158
[10] Offline Meta-Reinforcement Learning for Industrial Insertion
Zhao, Tony Z.
Luo, Jianlan
Sushkov, Oleg
Pevceviciute, Rugile
Heess, Nicolas
Scholz, Jon
Schaal, Stefan
Levine, Sergey
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 6386 - 6393

← 1 2 3 4 5 →