Near-Optimal Regret Bounds for Thompson Sampling

被引:57
|
作者
Agrawal, Shipra [1 ]
Goyal, Navin [1 ,2 ]
机构
[1] Microsoft Res, 9 Lavelle Rd, Bengaluru 560001, Karnataka, India
[2] Columbia Univ, Dept Ind Engn & Operat Res, 500 West 120th St,Mudd 423, New York, NY 10027 USA
关键词
Multi-armed bandits; PRINCIPLE;
D O I
10.1145/3088510
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Thompson Sampling (TS) is one of the oldest heuristics for multiarmed bandit problems. It is a randomized algorithm based on Bayesian ideas and has recently generated significant interest after several studies demonstrated that it has favorable empirical performance compared to the state-of-the-art methods. In this article, a novel and almost tight martingale-based regret analysis for Thompson Sampling is presented. Our technique simultaneously yields both problem-dependent and problem-independent bounds: (1) the first near-optimal problem-independent bound of O(root NT ln T) on the expected regret and (2) the optimal problem-dependent bound of (1 + epsilon) Sigma i ln T/d(mu(i), mu(1)) + O(N/epsilon(2)) on the expected regret (this bound was first proven by Kaufmann et al. (2012b)). Our technique is conceptually simple and easily extends to distributions other than the Beta distribution used in the original TS algorithm. For the version of TS that uses Gaussian priors, we prove a problem-independent bound of O(root NT ln N) on the expected regret and show the optimality of this bound by providing a matching lower bound. This is the first lower bound on the performance of a natural version of Thompson Sampling that is away from the general lower bound of Omega(root NT) for the multiarmed bandit problem.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Self-accelerated Thompson sampling with near-optimal regret upper bound
    Zhu, Zhenyu
    Huang, Liusheng
    Xu, Hongli
    NEUROCOMPUTING, 2020, 399 : 37 - 47
  • [2] Near-optimal Regret Bounds for Reinforcement Learning
    Jaksch, Thomas
    Ortner, Ronald
    Auer, Peter
    JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 1563 - 1600
  • [3] Near-optimal regret bounds for reinforcement learning
    Jaksch, Thomas
    Ortner, Ronald
    Auer, Peter
    Journal of Machine Learning Research, 2010, 11 : 1563 - 1600
  • [4] Near-optimal Per-Action Regret Bounds for Sleeping Bandits
    Quan Nguyen
    Mehta, Nishant A.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [5] Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning
    Zhang, Zihan
    Jiang, Yuhang
    Zhou, Yuan
    Ji, Xiangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [6] Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds
    Mitra, Aritra
    Adibi, Arman
    Pappas, George J.
    Hassani, Hamed
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [7] Feedback graph regret bounds for Thompson Sampling and UCB
    Lykouris, Thodoris
    Tardos, Eva
    Wali, Drishti
    ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 592 - 614
  • [8] Society of Agents: Regret Bounds of Concurrent Thompson Sampling
    Chen, Yan
    Dong, Perry
    Bai, Qinxun
    Dimakopoulou, Maria
    Xu, Wei
    Zhou, Zhengyuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] NUMERICAL EVALUATION OF SAMPLING BOUNDS FOR NEAR-OPTIMAL RECONSTRUCTION IN COMPRESSED SENSING
    Le Montagner, Yoann
    Marim, Marcio
    Angelini, Elsa
    Olivo-Marin, Jean-Christophe
    2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011,
  • [10] Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning
    Moradipari, Ahmadreza
    Pedramfar, Mohammad
    Zini, Modjtaba Shokrian
    Aggarwal, Vaneet
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,