Exploration via Planning for Information about the Optimal Trajectory

被引:0
|
作者
Mehta, Viraj [1 ]
Char, Ian [2 ]
Abbate, Joseph [4 ]
Conlin, Rory [4 ]
Boyer, Mark D. [3 ]
Ermon, Stefano [5 ]
Schneider, Jeff [1 ]
Neiswanger, Willie [5 ]
机构
[1] Robot Inst, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Machine Learning Dept, Pittsburgh, PA USA
[3] Princeton Plasma Phys Lab, Princeton, NJ USA
[4] Princeton Univ, Princeton, NJ USA
[5] Stanford Univ, Dept Comp Sci, Stanford, CA USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many potential applications of reinforcement learning (RL) are stymied by the large numbers of samples required to learn an effective policy. This is especially true when applying RL to real-world control tasks, e.g. in the sciences or robotics, where executing a policy in the environment is costly. In popular RL algorithms, agents typically explore either by adding stochasticity to a reward-maximizing policy or by attempting to gather maximal information about environment dynamics without taking the given task into account. In this work, we develop a method that allows us to plan for exploration while taking both the task and the current knowledge about the dynamics into account. The key insight to our approach is to plan an action sequence that maximizes the expected information gain about the optimal trajectory for the task at hand. We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines and 200x fewer samples than model free methods on a diverse set of low-to-medium dimensional control tasks in both the open-loop and closed-loop control settings.(1)
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Completion Time Minimization for Multi-UAV Information Collection via Trajectory Planning
    Qin, Zhen
    Li, Aijing
    Dong, Chao
    Dai, Haipeng
    Xu, Zhengqin
    SENSORS, 2019, 19 (18)
  • [22] Trajectory Planning and Control of Multiple Quadcopters for Mars Exploration
    Jiang, Hankun
    Chen, Kaiyuan
    Chai, Runqi
    Yu, Jin
    Guo, Chun
    Xia, Yuanqing
    JOURNAL OF AEROSPACE ENGINEERING, 2024, 37 (04)
  • [23] Optimal Trajectory Planning for Interstitial Hyperthermia Processes
    Rhein, Soenke
    Oesterle, Charles
    Graichen, Knut
    IFAC PAPERSONLINE, 2016, 49 (08): : 136 - 141
  • [24] Optimal rendezvous trajectory planning with time constraint
    Qi, Yinghong
    Cao, Xibin
    ISSCAA 2006: 1ST INTERNATIONAL SYMPOSIUM ON SYSTEMS AND CONTROL IN AEROSPACE AND ASTRONAUTICS, VOLS 1AND 2, 2006, : 419 - +
  • [25] Optimal Control for the Trajectory Planning of Micro Airships
    Blouin, Charles
    Lanteigne, Eric
    Gueaieb, Wail
    2017 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS'17), 2017, : 885 - 892
  • [26] A NOVEL ALGORITHM FOR TIME OPTIMAL TRAJECTORY PLANNING
    Yuan, Mingxing
    Yao, Bin
    Gao, Dedong
    Zhu, Xiaocong
    Wang, Qingfeng
    7TH ANNUAL DYNAMIC SYSTEMS AND CONTROL CONFERENCE, 2014, VOL 1, 2014,
  • [27] Optimal pose trajectory planning for robot manipulators
    Zha, XF
    MECHANISM AND MACHINE THEORY, 2002, 37 (10) : 1063 - 1086
  • [28] Robot optimal trajectory planning based on geodesics
    Zhang, Liandong
    Zhou, Changjiu
    2007 IEEE INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION, VOLS 1-7, 2007, : 543 - +
  • [29] An Optimal Time Model for the DCV Trajectory Planning
    Mao Gang
    Du Mingqian
    Chen Yi
    Yang Xiuqing
    Wang Kun
    2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 2066 - 2070
  • [30] Stochastic approximation for optimal observer trajectory planning
    Singh, S
    Vo, BN
    Doucet, A
    Evans, R
    42ND IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-6, PROCEEDINGS, 2003, : 6313 - 6318