Supervised Meta-Reinforcement Learning With Trajectory Optimization for Manipulation Tasks

被引:0
|
作者
Wang, Lei [1 ]
Zhang, Yunzhou [2 ]
Zhu, Delong [3 ]
Coleman, Sonya [4 ]
Kerr, Dermot [4 ]
机构
[1] Northeastern Univ, Fac Robot Sci & Engn, Shenyang 110000, Peoples R China
[2] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110819, Peoples R China
[3] Chinese Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China
[4] Univ Ulster, Sch Comp Engn & Intelligent Syst, Coleraine BT52 1SA, North Ireland
基金
中国国家自然科学基金;
关键词
Task analysis; Trajectory optimization; Robots; Heuristic algorithms; Training; Complexity theory; Dynamical systems; Iterative LQR (iLQR); meta learning; reinforcement learning (RL); robotic manipulation; trajectory optimization; LEVEL; GO;
D O I
10.1109/TCDS.2023.3286465
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning from small amounts of samples with reinforcement learning (RL) is challenging in many tasks, especially, in real-world applications, such as robotics. Meta-RL (meta-RL) has been proposed as an approach to address this problem by generalizing to new tasks through experience from previous similar tasks. However, these approaches generally perform meta-optimization by focusing direct policy search methods on validation samples from adapted policies, thus, requiring large amounts of on-policy samples during meta-training. To this end, we propose a novel algorithm called supervised meta-RL with trajectory optimization (SMRL-TO) by integrating model-agnostic meta-learning (MAML) and iterative LQR (iLQR)-based trajectory optimization. Our approach is designed to provide online supervision for validation samples through iLQR-based trajectory optimization and embed simple imitation learning into the meta-optimization rather than policy gradient steps. This is actually a bi-level optimization that needs to calculate several gradient updates in each meta-iteration, consisting of off-policy RL in the inner loop and online imitation learning in the outer loop. SMRL-TO can achieve significant improvements in sample efficiency without human-provided demonstrations, due to the effective supervision from iLQR-based trajectory optimization. In this article, we describe how to use iLQR-based trajectory optimization to obtain labeled data and then how leverage them to assist the training of meta-learner. Through a series of robotic manipulation tasks, we further show that compared with the previous methods, the proposed approach can substantially improve sample efficiency and achieve better asymptotic performance.
引用
收藏
页码:681 / 691
页数:11
相关论文
共 50 条
  • [1] Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks
    Schoettler, Gerrit
    Nair, Ashvin
    Ojea, Juan Aparicio
    Levine, Sergey
    Solowjow, Eugen
    [J]. 2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 9728 - 9735
  • [2] Meta-Reinforcement Learning for Trajectory Design in Wireless UAV Networks
    Hu, Ye
    Chen, Mingzhe
    Saad, Walid
    Poor, H. Vincent
    Cui, Shuguang
    [J]. 2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [3] Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning
    Wang, Mingyang
    Bing, Zhenshan
    Yao, Xiangtong
    Wang, Shuai
    Kai, Huang
    Su, Hang
    Yang, Chenguang
    Knoll, Alois
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 10157 - 10165
  • [4] Image quality assessment for machine learning tasks using meta-reinforcement learning
    Saeed, Shaheer U.
    Fu, Yunguan
    Stavrinides, Vasilis
    Baum, Zachary M.C.
    Yang, Qianye
    Rusu, Mirabela
    Fan, Richard E.
    Sonn, Geoffrey A.
    Noble, J. Alison
    Barratt, Dean C.
    Hu, Yipeng
    [J]. Medical Image Analysis, 2022, 78
  • [5] Multiagent Meta-Reinforcement Learning for Adaptive Multipath Routing Optimization
    Chen, Long
    Hu, Bin
    Guan, Zhi-Hong
    Zhao, Lian
    Shen, Xuemin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (10) : 5374 - 5386
  • [6] Solving Challenging Dexterous Manipulation Tasks With Trajectory Optimisation and Reinforcement Learning
    Charlesworth, Henry
    Montana, Giovanni
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [7] Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning
    Mu, Yao
    Zhuang, Yuzheng
    Ni, Fei
    Wang, Bin
    Chen, Jianyu
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] Parameterizing Non-Parametric Meta-Reinforcement Learning Tasks via Subtask Decomposition
    Lee, Suyoung
    Cho, Myungsik
    Sung, Youngchul
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Robust interplanetary trajectory design under multiple uncertainties via meta-reinforcement learning
    Federici, Lorenzo
    Zavoli, Alessandro
    [J]. ACTA ASTRONAUTICA, 2024, 214 : 147 - 158
  • [10] Offline Meta-Reinforcement Learning for Industrial Insertion
    Zhao, Tony Z.
    Luo, Jianlan
    Sushkov, Oleg
    Pevceviciute, Rugile
    Heess, Nicolas
    Scholz, Jon
    Schaal, Stefan
    Levine, Sergey
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 6386 - 6393