Exploration-Exploitation in MDPs with Options

被引：0

作者：

Fruit, Ronan ^{[1
]}

Lazaric, Alessandro ^{[1
]}

机构：

[1] Inria Lille, SequeL Team, Villeneuve Dascq, France

来源：

ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54 | 2017年 / 54卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited. In this paper, we derive an upper and lower bound on the regret of a variant of UCRL using options. While we first analyze the algorithm in the general case of semi-Markov decision processes (SMDPs), we show how these results can be translated to the specific case of MDPs with options and we illustrate simple scenarios in which the regret of learning with options can be provably much smaller than the regret suffered when learning with primitive actions.

引用

页码：576 / 584

页数：9

共 50 条

[31] Interactive Exploration-Exploitation Balancing for Generative Melody Composition
Zhou, Yijun
Koyama, Yuki
Goto, Masataka
Igarashi, Takeo
[J]. IUI '21 - 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, 2021, : 43 - 47
[32] Model-free Optimization: The Exploration-Exploitation Paradigm
Raphel, Mariya
Gunjal, Revati
Wagh, S. R.
Singh, N. M.
[J]. 2022 EIGHTH INDIAN CONTROL CONFERENCE, ICC, 2022, : 422 - 427
[33] Individual and context-evoked antecedents of exploration-exploitation performance
Richner, Jan
Zagorac-Uremovic, Zorica
Laureiro-Martinez, Daniella
[J]. FRONTIERS IN PSYCHOLOGY, 2023, 14
[34] Uncertainty quantification and exploration-exploitation trade-off in humans
Candelieri, Antonio
Ponti, Andrea
Archetti, Francesco
[J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (6) : 6843 - 6876
[35] Exploration-exploitation Balancing Deployment Strategy in UAV Sensor Networks
Li, Xuanya
Ci, Linlin
Yang, Minghua
Cheng, Bin
[J]. INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2011, 14 (08): : 2701 - 2710
[36] Exploration-exploitation Trade-off in a Treasure Hunting Game
Volchenkov, Dimitri
Helbach, Jonathan
Tscherepanow, Marko
Kueheel, Sina
[J]. ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2013, 299 : 101 - 121
[37] Parallelizing Exploration-Exploitation Tradeoffs in Gaussian Process Bandit Optimization
Desautels, Thomas
Krause, Andreas
Burdick, Joel W.
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 3873 - 3923
[38] Adaptive exploration policy for exploration-exploitation tradeoff in continuous action control optimization
Li, Min
Huang, Tianyi
Zhu, William
[J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (12) : 3491 - 3501
[39] Stream-based Joint Exploration-Exploitation Active Learning
Loy, Chen Change
Hospedales, Timothy M.
Xiang, Tao
Gong, Shaogang
[J]. 2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 1560 - 1567
[40] How Do Performance Goals Influence Exploration-Exploitation Choices?
Raveendran, Marlo
Srikanth, Kannan
Ungureanu, Tiberiu
Zheng, George L.
[J]. ORGANIZATION SCIENCE, 2023, 34 (06) : 2464 - 2486

← 1 2 3 4 5 →