Neural Dynamic Policies for End-to-End Sensorimotor Learning

被引:0
|
作者
Bahl, Shikhar [1 ]
Mukadam, Mustafa [2 ]
Gupta, Abhinav [1 ]
Pathak, Deepak [1 ]
机构
[1] CMU, Pittsburgh, PA 15213 USA
[2] FAIR, Seattle, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces such as torque, joint angle, or end-effector position. This forces the agent to make decision at each point in training, and hence, limit the scalability to continuous, high-dimensional, and long-horizon tasks. In contrast, research in classical robotics has, for a long time, exploited dynamical systems as a policy representation to learn robot behaviors via demonstrations. These techniques, however, lack the flexibility and generalizability provided by deep learning or deep reinforcement learning and have remained under-explored in such settings. In this work, we begin to close this gap and embed dynamics structure into deep neural network-based policies by reparameterizing action spaces with differential equations. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space as opposed to prior policy learning methods where action represents the raw control space. The embedded structure allow us to perform end-to-end policy learning under both reinforcement and imitation learning setups. We show that NDPs achieve better or comparable performance to state-of-the-art approaches on many robotic control tasks using both reward-based training and demonstrations. Project video and code are available at: https://shikharbahl.github.io/neural-dynamic-policies/.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Learning Neural Models for End-to-End Clustering
    Meier, Benjamin Bruno
    Elezi, Ismail
    Amirian, Mohammadreza
    Duerr, Oliver
    Stadelmann, Thilo
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, ANNPR 2018, 2018, 11081 : 126 - 138
  • [2] End-to-end sensorimotor control problems of AUVs with deep reinforcement learning
    Wu, Hui
    Song, Shiji
    Hsu, Yachu
    You, Keyou
    Wu, Cheng
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 5869 - 5874
  • [3] End-to-end learning of convolutional neural net and dynamic programming for left ventricle segmentation
    Nguyen, Nhat M.
    Ray, Nilanjan
    MEDICAL IMAGING WITH DEEP LEARNING, VOL 121, 2020, 121 : 555 - 569
  • [4] Neural End-to-End Learning for Computational Argumentation Mining
    Eger, Steffen
    Daxenberger, Johannes
    Gurevych, Iryna
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 11 - 22
  • [5] End-to-end availability policies and noninterference
    Zheng, LT
    Myers, AC
    18TH IEEE COMPUTER SECURITY FOUNDATIONS WORKSHOP, PROCEEDINGS, 2005, : 272 - 286
  • [6] End-to-end Verification of QoS Policies
    El-Atawy, Adel
    Samak, Taghrid
    2012 IEEE NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (NOMS), 2012, : 426 - 434
  • [7] End-to-end learning of user equilibrium with implicit neural networks
    Liu, Zhichen
    Yin, Yafeng
    Bai, Fan
    Grimm, Donald K.
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2023, 150
  • [8] LEARNING ENVIRONMENTAL SOUNDS WITH END-TO-END CONVOLUTIONAL NEURAL NETWORK
    Tokozume, Yuji
    Harada, Tatsuya
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2721 - 2725
  • [9] NEURAL DYNAMIC MODE DECOMPOSITION FOR END-TO-END MODELING OF NONLINEAR DYNAMICS
    Iwata, Tomoharu
    Kawahara, Yoshinobu
    JOURNAL OF COMPUTATIONAL DYNAMICS, 2023, 10 (02): : 268 - 280
  • [10] Learning Stability Attention in Vision-based End-to-end Driving Policies
    Wang, Tsun-Hsuan
    Xiao, Wei
    Chahine, Makram
    Amini, Alexander
    Hasani, Ramin
    Rus, Daniela
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211