Fast active learning for pure exploration in reinforcement learning

被引：0

作者：

Menard, Pierre ^{[1
]}

Domingues, Omar Darwiche ^{[2
]}

Kaufmann, Emilie ^{[2
,3
]}

Jonsson, Anders ^{[4
]}

Leurent, Edouard ^{[2
]}

Valko, Michal ^{[2
,3
,5
]}

机构：

[1] Otto von Guericke Univ, Magdeburg, Germany

[2] Inria, Paris, France

[3] Univ Lille, Lille, France

[4] Univ Pompeu Fabra, Barcelona, Spain

[5] DeepMind Paris, Paris, France

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

BOUNDS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Realistic environments often provide agents with very limited feedback. When the environment is initially unknown, the feedback, in the beginning, can be completely absent, and the agents may first choose to devote all their effort on exploring efficiently. The exploration remains a challenge while it has been addressed with many hand-tuned heuristics with different levels of generality on one side, and a few theoretically-backed exploration strategies on the other. Many of them are incarnated by intrinsic motivation and in particular explorations bonuses. A common choice is to use 1/root n bonus, where n is a number of times this particular state-action pair was visited. We show that, surprisingly, for a pure-exploration objective of reward-free exploration, bonuses that scale with an bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon H. Furthermore, we show that with an improved analysis of the stopping time, we can improve by a factor H the sample complexity in the best-policy identification setting, which is another pure-exploration objective, where the environment provides rewards but the agent is not penalized for its behavior during the exploration phase.

引用

页数：10

共 50 条

[21] Bayesian Reinforcement Learning with Exploration
Lattimore, Tor
Hutter, Marcus
ALGORITHMIC LEARNING THEORY (ALT 2014), 2014, 8776 : 170 - 184
[22] Reinforcement learning with inertial exploration
Bergeron, Dany
Desjardins, Charles
Laurnnier, Julien
Chaib-draa, Brahim
PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY (IAT 2007), 2007, : 277 - +
[23] A Hierarchical SLAM Framework Based on Deep Reinforcement Learning for Active Exploration
Xue, Yuntao
Chen, Weisheng
Zhang, Liangbin
PROCEEDINGS OF 2022 INTERNATIONAL CONFERENCE ON AUTONOMOUS UNMANNED SYSTEMS, ICAUS 2022, 2023, 1010 : 957 - 966
[24] Active Tactile Exploration using Shape-Dependent Reinforcement Learning
Jiang, Shuo
Wong, Lawson L. S.
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 8995 - 9002
[25] Active exploration planning in reinforcement learning for inverted pendulum system control
Zheng, Yu
Luo, Si-Wei
Lv, Zi-Ang
PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 2805 - +
[26] Reinforcement learning with phased approach for fast learning
Hodohara, Norifumi
Murakami, Yuichi
Nakamura, Shingo
Hashimoto, Shuji
PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 17TH '12), 2012, : 930 - 933
[27] Learning of deterministic exploration and temporal abstraction in reinforcement learning
Shibata, Katsunari
2006 SICE-ICASE International Joint Conference, Vols 1-13, 2006, : 2212 - 2217
[28] Reinforcement Learning, Fast and Slow
Botvinick, Matthew
Ritter, Sam
Wang, Jane X.
Kurth-Nelson, Zeb
Blundell, Charles
Hassabis, Demis
TRENDS IN COGNITIVE SCIENCES, 2019, 23 (05) : 408 - 422
[29] Reinforcement Learning Based on Active Learning Method
Sagha, Hesam
Shouraki, Saeed Bagheri
Khasteh, Hosein
Kiaei, Ali Akbar
2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL II, PROCEEDINGS, 2008, : 598 - +
[30] On the Importance of Exploration for Generalization in Reinforcement Learning
Jiang, Yiding
Kolter, J. Zico
Raileanu, Roberta
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →