Near-Optimal Regret Bounds for Thompson Sampling

被引:57
|
作者
Agrawal, Shipra [1 ]
Goyal, Navin [1 ,2 ]
机构
[1] Microsoft Res, 9 Lavelle Rd, Bengaluru 560001, Karnataka, India
[2] Columbia Univ, Dept Ind Engn & Operat Res, 500 West 120th St,Mudd 423, New York, NY 10027 USA
关键词
Multi-armed bandits; PRINCIPLE;
D O I
10.1145/3088510
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Thompson Sampling (TS) is one of the oldest heuristics for multiarmed bandit problems. It is a randomized algorithm based on Bayesian ideas and has recently generated significant interest after several studies demonstrated that it has favorable empirical performance compared to the state-of-the-art methods. In this article, a novel and almost tight martingale-based regret analysis for Thompson Sampling is presented. Our technique simultaneously yields both problem-dependent and problem-independent bounds: (1) the first near-optimal problem-independent bound of O(root NT ln T) on the expected regret and (2) the optimal problem-dependent bound of (1 + epsilon) Sigma i ln T/d(mu(i), mu(1)) + O(N/epsilon(2)) on the expected regret (this bound was first proven by Kaufmann et al. (2012b)). Our technique is conceptually simple and easily extends to distributions other than the Beta distribution used in the original TS algorithm. For the version of TS that uses Gaussian priors, we prove a problem-independent bound of O(root NT ln N) on the expected regret and show the optimality of this bound by providing a matching lower bound. This is the first lower bound on the performance of a natural version of Thompson Sampling that is away from the general lower bound of Omega(root NT) for the multiarmed bandit problem.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Near-Optimal Bounds for Testing Histogram Distributions
    Canonne, Clement L.
    Diakonikolas, Ilias
    Kane, Daniel M.
    Liu, Sihan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [22] Prior-free and prior-dependent regret bounds for Thompson Sampling
    Bubeck, Sebastien
    Liu, Che-Yu
    2014 48TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2014,
  • [23] Near-Optimal Φ-Regret Learning in Extensive-Form Games
    Anagnostides, Ioannis
    Farina, Gabriele
    Sandholm, Tuomas
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202 : 814 - 839
  • [24] A Biased Graph Neural Network Sampler with Near-Optimal Regret
    Zhang, Qingru
    Wipf, David
    Gan, Quan
    Song, Le
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [25] Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
    Jin, Tiancheng
    Lancewicki, Tal
    Luo, Haipeng
    Mansour, Yishay
    Rosenberg, Aviv
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [26] Near-Optimal Sample Complexity Bounds for Constrained MDPs
    Vaswani, Sharan
    Yang, Lin F.
    Szepesvari, Csaba
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [27] Near-Optimal Complexity Bounds for Fragments of the Skolem Problem
    Akshay, S.
    Balaji, Nikhil
    Murhekar, Aniket
    Varma, Rohith
    Vyas, Nikhil
    37TH INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2020), 2020, 154
  • [28] Near-Optimal No-Regret Algorithms for Zero-Sum Games
    Daskalakis, Constantinos
    Deckelbaum, Alan
    Kim, Anthony
    PROCEEDINGS OF THE TWENTY-SECOND ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2011, : 235 - 254
  • [29] Near-optimal discrete optimization for experimental design: a regret minimization approach
    Allen-Zhu, Zeyuan
    Li, Yuanzhi
    Singh, Aarti
    Wang, Yining
    MATHEMATICAL PROGRAMMING, 2021, 186 (1-2) : 439 - 478
  • [30] Near-optimal discrete optimization for experimental design: a regret minimization approach
    Zeyuan Allen-Zhu
    Yuanzhi Li
    Aarti Singh
    Yining Wang
    Mathematical Programming, 2021, 186 : 439 - 478