Fast active learning for pure exploration in reinforcement learning

被引:0
|
作者
Menard, Pierre [1 ]
Domingues, Omar Darwiche [2 ]
Kaufmann, Emilie [2 ,3 ]
Jonsson, Anders [4 ]
Leurent, Edouard [2 ]
Valko, Michal [2 ,3 ,5 ]
机构
[1] Otto von Guericke Univ, Magdeburg, Germany
[2] Inria, Paris, France
[3] Univ Lille, Lille, France
[4] Univ Pompeu Fabra, Barcelona, Spain
[5] DeepMind Paris, Paris, France
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷
关键词
BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Realistic environments often provide agents with very limited feedback. When the environment is initially unknown, the feedback, in the beginning, can be completely absent, and the agents may first choose to devote all their effort on exploring efficiently. The exploration remains a challenge while it has been addressed with many hand-tuned heuristics with different levels of generality on one side, and a few theoretically-backed exploration strategies on the other. Many of them are incarnated by intrinsic motivation and in particular explorations bonuses. A common choice is to use 1/root n bonus, where n is a number of times this particular state-action pair was visited. We show that, surprisingly, for a pure-exploration objective of reward-free exploration, bonuses that scale with an bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon H. Furthermore, we show that with an improved analysis of the stopping time, we can improve by a factor H the sample complexity in the best-policy identification setting, which is another pure-exploration objective, where the environment provides rewards but the agent is not penalized for its behavior during the exploration phase.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Exploration in deep reinforcement learning: A survey
    Ladosz, Pawel
    Weng, Lilian
    Kim, Minwoo
    Oh, Hyondong
    INFORMATION FUSION, 2022, 85 : 1 - 22
  • [32] Distributional Reinforcement Learning for Efficient Exploration
    Mavrin, Borislav
    Yao, Hengshuai
    Kong, Linglong
    Wu, Kaiwen
    Yu, Yaoliang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [33] Adaptive Exploration Strategies for Reinforcement Learning
    Hwang, Kao-Shing
    Li, Chih-Wen
    Jiang, Wei-Cheng
    2017 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING (ICSSE), 2017, : 16 - 19
  • [34] Uncertainty Quantification and Exploration for Reinforcement Learning
    Zhu, Yi
    Dong, Jing
    Lam, Henry
    OPERATIONS RESEARCH, 2024, 72 (04) : 1689 - 1709
  • [35] Coordinated Exploration in Concurrent Reinforcement Learning
    Dimakopoulou, Maria
    Van Roy, Benjamin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [36] Overcoming Exploration in Reinforcement Learning with Demonstrations
    Nair, Ashvin
    McGrew, Bob
    Andrychowicz, Marcin
    Zaremba, Wojciech
    Abbeel, Pieter
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 6292 - 6299
  • [37] Improving Reinforcement Learning Exploration by Autoencoders
    Paczolay, Gabor
    Harmati, Istvan
    Periodica Polytechnica Electrical Engineering and Computer Science, 2024, 68 (04): : 335 - 343
  • [38] Exploration Conscious Reinforcement Learning Revisited
    Shani, Lior
    Efroni, Yonathan
    Mannor, Shie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [39] Adaptive Exploration for Continual Reinforcement Learning
    Stulp, Freek
    2012 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2012, : 1631 - 1636
  • [40] Active Exploration by Chance-Constrained Optimization for Voltage Regulation with Reinforcement Learning
    Ding, Zhenhuan
    Huang, Xiaoge
    Liu, Zhao
    ENERGIES, 2022, 15 (02)