Model Predictive Actor-Critic: Accelerating Robot Skill Acquisition with Deep Reinforcement Learning

被引:20
|
作者
Morgan, Andrew S. [1 ]
Nandha, Daljeet [2 ]
Chalvatzaki, Georgia [2 ]
D'Eramo, Carlo [2 ]
Dollar, Aaron M. [1 ]
Peters, Jan [2 ]
机构
[1] Yale Univ, Dept Mech Engn & Mat Sci, New Haven, CT 06520 USA
[2] Tech Univ Darmstadt, Intelligent Autonomous Syst, Darmstadt, Germany
关键词
MANIPULATION;
D O I
10.1109/ICRA48506.2021.9561298
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Substantial advancements to model-based reinforcement learning algorithms have been impeded by the model-bias induced by the collected data, which generally hurts performance. Meanwhile, their inherent sample efficiency warrants utility for most robot applications, limiting potential damage to the robot and its environment during training. Inspired by information theoretic model predictive control and advances in deep reinforcement learning, we introduce Model Predictive Actor-Critic (MoPAC)(dagger), a hybrid model-based/model-free method that combines model predictive rollouts with policy optimization as to mitigate model bias. MoPAC leverages optimal trajectories to guide policy learning, but explores via its model-free method, allowing the algorithm to learn more expressive dynamics models. This combination guarantees optimal skill learning up to an approximation error and reduces necessary physical interaction with the environment, making it suitable for real-robot training. We provide extensive results showcasing how our proposed method generally outperforms current state-of-the-art and conclude by evaluating MoPAC for learning on a physical robotic hand performing valve rotation and finger gaiting-a task that requires grasping, manipulation, and then regrasping of an object.
引用
收藏
页码:6672 / 6678
页数:7
相关论文
共 50 条
  • [1] Integrated Actor-Critic for Deep Reinforcement Learning
    Zheng, Jiaohao
    Kurt, Mehmet Necip
    Wang, Xiaodong
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 505 - 518
  • [2] A World Model for Actor-Critic in Reinforcement Learning
    Panov, A. I.
    Ugadiarov, L. A.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
  • [3] Visual Navigation with Actor-Critic Deep Reinforcement Learning
    Shao, Kun
    Zhao, Dongbin
    Zhu, Yuanheng
    Zhang, Qichao
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [4] Stochastic Integrated Actor-Critic for Deep Reinforcement Learning
    Zheng, Jiaohao
    Kurt, Mehmet Necip
    Wang, Xiaodong
    [J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35 (05) : 6654 - 6666
  • [5] Deep Actor-Critic Reinforcement Learning for Anomaly Detection
    Zhong, Chen
    Gursoy, M. Cenk
    Velipasalar, Senem
    [J]. 2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [6] Averaged Soft Actor-Critic for Deep Reinforcement Learning
    Ding, Feng
    Ma, Guanfeng
    Chen, Zhikui
    Gao, Jing
    Li, Peng
    [J]. COMPLEXITY, 2021, 2021
  • [7] A Sandpile Model for Reliable Actor-Critic Reinforcement Learning
    Peng, Yiming
    Chen, Gang
    Zhang, Mengjie
    Pang, Shaoning
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 4014 - 4021
  • [8] Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model
    Lee, Alex X.
    Nagabandi, Anusha
    Abbeel, Pieter
    Levine, Sergey
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [9] ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR DYNAMIC MULTICHANNEL ACCESS
    Zhong, Chen
    Lu, Ziyang
    Gursoy, M. Cenk
    Velipasalar, Senem
    [J]. 2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 599 - 603
  • [10] A Prioritized objective actor-critic method for deep reinforcement learning
    Ngoc Duy Nguyen
    Thanh Thi Nguyen
    Peter Vamplew
    Richard Dazeley
    Saeid Nahavandi
    [J]. Neural Computing and Applications, 2021, 33 : 10335 - 10349