Neural Dynamic Policies for End-to-End Sensorimotor Learning

被引:0
|
作者
Bahl, Shikhar [1 ]
Mukadam, Mustafa [2 ]
Gupta, Abhinav [1 ]
Pathak, Deepak [1 ]
机构
[1] CMU, Pittsburgh, PA 15213 USA
[2] FAIR, Seattle, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces such as torque, joint angle, or end-effector position. This forces the agent to make decision at each point in training, and hence, limit the scalability to continuous, high-dimensional, and long-horizon tasks. In contrast, research in classical robotics has, for a long time, exploited dynamical systems as a policy representation to learn robot behaviors via demonstrations. These techniques, however, lack the flexibility and generalizability provided by deep learning or deep reinforcement learning and have remained under-explored in such settings. In this work, we begin to close this gap and embed dynamics structure into deep neural network-based policies by reparameterizing action spaces with differential equations. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space as opposed to prior policy learning methods where action represents the raw control space. The embedded structure allow us to perform end-to-end policy learning under both reinforcement and imitation learning setups. We show that NDPs achieve better or comparable performance to state-of-the-art approaches on many robotic control tasks using both reward-based training and demonstrations. Project video and code are available at: https://shikharbahl.github.io/neural-dynamic-policies/.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Handwritten Text Segmentation via End-to-End Learning of Convolutional Neural Networks
    Jo, Junho
    Koo, Hyung Il
    Soh, Jae Woong
    Cho, Nam Ik
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (43-44) : 32137 - 32150
  • [42] End-to-end human inspired learning based system for dynamic obstacle avoidance
    S. M. Haider Jafri
    Rahul Kala
    Complex & Intelligent Systems, 2022, 8 : 5065 - 5086
  • [43] End-to-end human inspired learning based system for dynamic obstacle avoidance
    Jafri, S. M. Haider
    Kala, Rahul
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (06) : 5065 - 5086
  • [44] An End-to-End Deep Learning Method for Dynamic Job Shop Scheduling Problem
    Chen, Shifan
    Huang, Zuyi
    Guo, Hongfei
    MACHINES, 2022, 10 (07)
  • [45] End-to-end Neural Information Status Classification
    Hou, Yufang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1377 - 1388
  • [46] End-to-End Neural Text Classification for Tibetan
    Qun, Nuo
    Li, Xing
    Qiu, Xipeng
    Huang, Xuanjing
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 472 - 480
  • [47] End-to-end neural event coreference resolution
    Lu, Yaojie
    Lin, Hongyu
    Tang, Jialong
    Han, Xianpei
    Sun, Le
    ARTIFICIAL INTELLIGENCE, 2022, 303
  • [48] SoundStream: An End-to-End Neural Audio Codec
    Zeghidour, Neil
    Luebs, Alejandro
    Omran, Ahmed
    Skoglund, Jan
    Tagliasacchi, Marco
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 495 - 507
  • [49] End-to-end Interpretable Neural Motion Planner
    Zeng, Wenyuan
    Luo, Wenjie
    Suo, Simon
    Sadat, Abbas
    Yang, Bin
    Casas, Sergio
    Urtasun, Raquel
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8652 - 8661
  • [50] Contextualized End-to-End Neural Entity Linking
    Chen, Haotian
    Zukov-Gregoric, Andrej
    Li, Xi
    Wadhwa, Sahil
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 637 - 642