Fast active learning for pure exploration in reinforcement learning

被引：0

作者：

Menard, Pierre ^{[1
]}

Domingues, Omar Darwiche ^{[2
]}

Kaufmann, Emilie ^{[2
,3
]}

Jonsson, Anders ^{[4
]}

Leurent, Edouard ^{[2
]}

Valko, Michal ^{[2
,3
,5
]}

机构：

[1] Otto von Guericke Univ, Magdeburg, Germany

[2] Inria, Paris, France

[3] Univ Lille, Lille, France

[4] Univ Pompeu Fabra, Barcelona, Spain

[5] DeepMind Paris, Paris, France

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

BOUNDS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Realistic environments often provide agents with very limited feedback. When the environment is initially unknown, the feedback, in the beginning, can be completely absent, and the agents may first choose to devote all their effort on exploring efficiently. The exploration remains a challenge while it has been addressed with many hand-tuned heuristics with different levels of generality on one side, and a few theoretically-backed exploration strategies on the other. Many of them are incarnated by intrinsic motivation and in particular explorations bonuses. A common choice is to use 1/root n bonus, where n is a number of times this particular state-action pair was visited. We show that, surprisingly, for a pure-exploration objective of reward-free exploration, bonuses that scale with an bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon H. Furthermore, we show that with an improved analysis of the stopping time, we can improve by a factor H the sample complexity in the best-policy identification setting, which is another pure-exploration objective, where the environment provides rewards but the agent is not penalized for its behavior during the exploration phase.

引用

页数：10

共 50 条

[31] Exploration in deep reinforcement learning: A survey
Ladosz, Pawel
Weng, Lilian
Kim, Minwoo
Oh, Hyondong
INFORMATION FUSION, 2022, 85 : 1 - 22
[32] Distributional Reinforcement Learning for Efficient Exploration
Mavrin, Borislav
Yao, Hengshuai
Kong, Linglong
Wu, Kaiwen
Yu, Yaoliang
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[33] Adaptive Exploration Strategies for Reinforcement Learning
Hwang, Kao-Shing
Li, Chih-Wen
Jiang, Wei-Cheng
2017 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING (ICSSE), 2017, : 16 - 19
[34] Uncertainty Quantification and Exploration for Reinforcement Learning
Zhu, Yi
Dong, Jing
Lam, Henry
OPERATIONS RESEARCH, 2024, 72 (04) : 1689 - 1709
[35] Coordinated Exploration in Concurrent Reinforcement Learning
Dimakopoulou, Maria
Van Roy, Benjamin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[36] Overcoming Exploration in Reinforcement Learning with Demonstrations
Nair, Ashvin
McGrew, Bob
Andrychowicz, Marcin
Zaremba, Wojciech
Abbeel, Pieter
2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 6292 - 6299
[37] Improving Reinforcement Learning Exploration by Autoencoders
Paczolay, Gabor
Harmati, Istvan
Periodica Polytechnica Electrical Engineering and Computer Science, 2024, 68 (04): : 335 - 343
[38] Exploration Conscious Reinforcement Learning Revisited
Shani, Lior
Efroni, Yonathan
Mannor, Shie
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[39] Adaptive Exploration for Continual Reinforcement Learning
Stulp, Freek
2012 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2012, : 1631 - 1636
[40] Active Exploration by Chance-Constrained Optimization for Voltage Regulation with Reinforcement Learning
Ding, Zhenhuan
Huang, Xiaoge
Liu, Zhao
ENERGIES, 2022, 15 (02)

← 1 2 3 4 5 →