Exploration via Planning for Information about the Optimal Trajectory

被引:0
|
作者
Mehta, Viraj [1 ]
Char, Ian [2 ]
Abbate, Joseph [4 ]
Conlin, Rory [4 ]
Boyer, Mark D. [3 ]
Ermon, Stefano [5 ]
Schneider, Jeff [1 ]
Neiswanger, Willie [5 ]
机构
[1] Robot Inst, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Machine Learning Dept, Pittsburgh, PA USA
[3] Princeton Plasma Phys Lab, Princeton, NJ USA
[4] Princeton Univ, Princeton, NJ USA
[5] Stanford Univ, Dept Comp Sci, Stanford, CA USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many potential applications of reinforcement learning (RL) are stymied by the large numbers of samples required to learn an effective policy. This is especially true when applying RL to real-world control tasks, e.g. in the sciences or robotics, where executing a policy in the environment is costly. In popular RL algorithms, agents typically explore either by adding stochasticity to a reward-maximizing policy or by attempting to gather maximal information about environment dynamics without taking the given task into account. In this work, we develop a method that allows us to plan for exploration while taking both the task and the current knowledge about the dynamics into account. The key insight to our approach is to plan an action sequence that maximizes the expected information gain about the optimal trajectory for the task at hand. We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines and 200x fewer samples than model free methods on a diverse set of low-to-medium dimensional control tasks in both the open-loop and closed-loop control settings.(1)
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Near-Optimal Trajectory Planning of a Spherical Mobile Robot for Environment Exploration
    Zhan, Qiang
    Cai, Yao
    Liu, Zengbo
    2008 IEEE CONFERENCE ON ROBOTICS, AUTOMATION, AND MECHATRONICS, VOLS 1 AND 2, 2008, : 314 - 319
  • [2] Optimal En-Route Trajectory Planning based on Wind Information
    Alizadeh, Ali
    Uzun, Mevlut
    Koyuncu, Emre
    Inalhan, Gokhan
    IFAC PAPERSONLINE, 2018, 51 (09): : 180 - 185
  • [3] Optimal Trajectory Planning for Robot
    Piao Songhao
    Zhong Qiubo
    Wang Xianfeng
    Gao Chao
    ADVANCES IN CIVIL ENGINEERING, PTS 1-6, 2011, 255-260 : 2091 - 2095
  • [4] Age-Optimal UAV Trajectory Planning for Information Gathering with Energy Constraints
    Zeng, Xiangjin
    Ma, Feipeng
    Chen, Tingwei
    Chen, Xuanzhang
    Wang, Xijun
    2020 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC), 2020, : 881 - 886
  • [5] Optimal Trajectory Planning for a Quadrotor via a Gauss Pseudo-spectrum Method
    Wang, Dan
    Zhang, Weizhong
    Shan, Jiayuan
    2013 NINTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2013, : 1666 - 1670
  • [6] Optimal trajectory design for global exploration of an asteroid via bi-impulsive transfers
    Shi, Yu
    Peng, Hao
    Wang, Yue
    Xu, Shijie
    INTERNATIONAL JOURNAL OF SPACE SCIENCE AND ENGINEERING, 2019, 5 (03) : 205 - 222
  • [7] OPTIMAL TRAJECTORY PLANNING FOR INDUSTRIAL ROBOTS
    JOHANNI, R
    PFEIFFER, F
    ROBOTERSYSTEME, 1987, 3 (01): : 29 - 36
  • [8] OPTIMAL TRAJECTORY PLANNING OF MANIPULATORS: A REVIEW
    Ata, Atef A.
    JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2007, 2 (01) : 32 - 54
  • [9] OPTIMAL CRUISING TRAJECTORY PLANNING FOR ROBOTS
    SOMLO, J
    PODURAJEV, J
    MECHATRONICS, 1994, 4 (05) : 517 - 538
  • [10] OPTIMAL TRAJECTORY PLANNING FOR ROBOTIC MANIPULATORS
    GEORGES, D
    HAMAM, Y
    RAIRO-AUTOMATIQUE-PRODUCTIQUE INFORMATIQUE INDUSTRIELLE-AUTOMATIC CONTROL PRODUCTION SYSTEMS, 1987, 21 (02): : 129 - 150