Fast active learning for pure exploration in reinforcement learning

被引:0
|
作者
Menard, Pierre [1 ]
Domingues, Omar Darwiche [2 ]
Kaufmann, Emilie [2 ,3 ]
Jonsson, Anders [4 ]
Leurent, Edouard [2 ]
Valko, Michal [2 ,3 ,5 ]
机构
[1] Otto von Guericke Univ, Magdeburg, Germany
[2] Inria, Paris, France
[3] Univ Lille, Lille, France
[4] Univ Pompeu Fabra, Barcelona, Spain
[5] DeepMind Paris, Paris, France
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷
关键词
BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Realistic environments often provide agents with very limited feedback. When the environment is initially unknown, the feedback, in the beginning, can be completely absent, and the agents may first choose to devote all their effort on exploring efficiently. The exploration remains a challenge while it has been addressed with many hand-tuned heuristics with different levels of generality on one side, and a few theoretically-backed exploration strategies on the other. Many of them are incarnated by intrinsic motivation and in particular explorations bonuses. A common choice is to use 1/root n bonus, where n is a number of times this particular state-action pair was visited. We show that, surprisingly, for a pure-exploration objective of reward-free exploration, bonuses that scale with an bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon H. Furthermore, we show that with an improved analysis of the stopping time, we can improve by a factor H the sample complexity in the best-policy identification setting, which is another pure-exploration objective, where the environment provides rewards but the agent is not penalized for its behavior during the exploration phase.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Bayesian Reinforcement Learning with Exploration
    Lattimore, Tor
    Hutter, Marcus
    ALGORITHMIC LEARNING THEORY (ALT 2014), 2014, 8776 : 170 - 184
  • [22] Reinforcement learning with inertial exploration
    Bergeron, Dany
    Desjardins, Charles
    Laurnnier, Julien
    Chaib-draa, Brahim
    PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY (IAT 2007), 2007, : 277 - +
  • [23] A Hierarchical SLAM Framework Based on Deep Reinforcement Learning for Active Exploration
    Xue, Yuntao
    Chen, Weisheng
    Zhang, Liangbin
    PROCEEDINGS OF 2022 INTERNATIONAL CONFERENCE ON AUTONOMOUS UNMANNED SYSTEMS, ICAUS 2022, 2023, 1010 : 957 - 966
  • [24] Active Tactile Exploration using Shape-Dependent Reinforcement Learning
    Jiang, Shuo
    Wong, Lawson L. S.
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 8995 - 9002
  • [25] Active exploration planning in reinforcement learning for inverted pendulum system control
    Zheng, Yu
    Luo, Si-Wei
    Lv, Zi-Ang
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 2805 - +
  • [26] Reinforcement learning with phased approach for fast learning
    Hodohara, Norifumi
    Murakami, Yuichi
    Nakamura, Shingo
    Hashimoto, Shuji
    PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 17TH '12), 2012, : 930 - 933
  • [27] Learning of deterministic exploration and temporal abstraction in reinforcement learning
    Shibata, Katsunari
    2006 SICE-ICASE International Joint Conference, Vols 1-13, 2006, : 2212 - 2217
  • [28] Reinforcement Learning, Fast and Slow
    Botvinick, Matthew
    Ritter, Sam
    Wang, Jane X.
    Kurth-Nelson, Zeb
    Blundell, Charles
    Hassabis, Demis
    TRENDS IN COGNITIVE SCIENCES, 2019, 23 (05) : 408 - 422
  • [29] Reinforcement Learning Based on Active Learning Method
    Sagha, Hesam
    Shouraki, Saeed Bagheri
    Khasteh, Hosein
    Kiaei, Ali Akbar
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL II, PROCEEDINGS, 2008, : 598 - +
  • [30] On the Importance of Exploration for Generalization in Reinforcement Learning
    Jiang, Yiding
    Kolter, J. Zico
    Raileanu, Roberta
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,