Exploration-Exploitation in MDPs with Options

被引:0
|
作者
Fruit, Ronan [1 ]
Lazaric, Alessandro [1 ]
机构
[1] Inria Lille, SequeL Team, Villeneuve Dascq, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited. In this paper, we derive an upper and lower bound on the regret of a variant of UCRL using options. While we first analyze the algorithm in the general case of semi-Markov decision processes (SMDPs), we show how these results can be translated to the specific case of MDPs with options and we illustrate simple scenarios in which the regret of learning with options can be provably much smaller than the regret suffered when learning with primitive actions.
引用
收藏
页码:576 / 584
页数:9
相关论文
共 50 条
  • [31] Interactive Exploration-Exploitation Balancing for Generative Melody Composition
    Zhou, Yijun
    Koyama, Yuki
    Goto, Masataka
    Igarashi, Takeo
    [J]. IUI '21 - 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, 2021, : 43 - 47
  • [32] Model-free Optimization: The Exploration-Exploitation Paradigm
    Raphel, Mariya
    Gunjal, Revati
    Wagh, S. R.
    Singh, N. M.
    [J]. 2022 EIGHTH INDIAN CONTROL CONFERENCE, ICC, 2022, : 422 - 427
  • [33] Individual and context-evoked antecedents of exploration-exploitation performance
    Richner, Jan
    Zagorac-Uremovic, Zorica
    Laureiro-Martinez, Daniella
    [J]. FRONTIERS IN PSYCHOLOGY, 2023, 14
  • [34] Uncertainty quantification and exploration-exploitation trade-off in humans
    Candelieri, Antonio
    Ponti, Andrea
    Archetti, Francesco
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (6) : 6843 - 6876
  • [35] Exploration-exploitation Balancing Deployment Strategy in UAV Sensor Networks
    Li, Xuanya
    Ci, Linlin
    Yang, Minghua
    Cheng, Bin
    [J]. INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2011, 14 (08): : 2701 - 2710
  • [36] Exploration-exploitation Trade-off in a Treasure Hunting Game
    Volchenkov, Dimitri
    Helbach, Jonathan
    Tscherepanow, Marko
    Kueheel, Sina
    [J]. ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2013, 299 : 101 - 121
  • [37] Parallelizing Exploration-Exploitation Tradeoffs in Gaussian Process Bandit Optimization
    Desautels, Thomas
    Krause, Andreas
    Burdick, Joel W.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 3873 - 3923
  • [38] Adaptive exploration policy for exploration-exploitation tradeoff in continuous action control optimization
    Li, Min
    Huang, Tianyi
    Zhu, William
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (12) : 3491 - 3501
  • [39] Stream-based Joint Exploration-Exploitation Active Learning
    Loy, Chen Change
    Hospedales, Timothy M.
    Xiang, Tao
    Gong, Shaogang
    [J]. 2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 1560 - 1567
  • [40] How Do Performance Goals Influence Exploration-Exploitation Choices?
    Raveendran, Marlo
    Srikanth, Kannan
    Ungureanu, Tiberiu
    Zheng, George L.
    [J]. ORGANIZATION SCIENCE, 2023, 34 (06) : 2464 - 2486