Near-Optimal Regret Bounds for Thompson Sampling

被引:57
|
作者
Agrawal, Shipra [1 ]
Goyal, Navin [1 ,2 ]
机构
[1] Microsoft Res, 9 Lavelle Rd, Bengaluru 560001, Karnataka, India
[2] Columbia Univ, Dept Ind Engn & Operat Res, 500 West 120th St,Mudd 423, New York, NY 10027 USA
关键词
Multi-armed bandits; PRINCIPLE;
D O I
10.1145/3088510
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Thompson Sampling (TS) is one of the oldest heuristics for multiarmed bandit problems. It is a randomized algorithm based on Bayesian ideas and has recently generated significant interest after several studies demonstrated that it has favorable empirical performance compared to the state-of-the-art methods. In this article, a novel and almost tight martingale-based regret analysis for Thompson Sampling is presented. Our technique simultaneously yields both problem-dependent and problem-independent bounds: (1) the first near-optimal problem-independent bound of O(root NT ln T) on the expected regret and (2) the optimal problem-dependent bound of (1 + epsilon) Sigma i ln T/d(mu(i), mu(1)) + O(N/epsilon(2)) on the expected regret (this bound was first proven by Kaufmann et al. (2012b)). Our technique is conceptually simple and easily extends to distributions other than the Beta distribution used in the original TS algorithm. For the version of TS that uses Gaussian priors, we prove a problem-independent bound of O(root NT ln N) on the expected regret and show the optimality of this bound by providing a matching lower bound. This is the first lower bound on the performance of a natural version of Thompson Sampling that is away from the general lower bound of Omega(root NT) for the multiarmed bandit problem.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Near-optimal no-regret algorithms for zero-sum games
    Daskalakis, Constantinos
    Deckelbaum, Alan
    Kim, Anthony
    GAMES AND ECONOMIC BEHAVIOR, 2015, 92 : 327 - 348
  • [32] Near-Optimal No-Regret Learning Dynamics for General Convex Games
    Farina, Gabriele
    Anagnostides, Ioannis
    Luo, Haipeng
    Lee, Chung-Wei
    Kroer, Christian
    Sandholm, Tuomas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [33] A minimax near-optimal algorithm for adaptive rejection sampling
    Achddou, Juliette
    Lam-Weil, Joseph
    Carpentier, Alexandra
    Blanchard, Gilles
    ALGORITHMIC LEARNING THEORY, VOL 98, 2019, 98
  • [34] Efficient and Near-Optimal Algorithms for Sampling Connected Subgraphs
    Bressan, Marco
    STOC '21: PROCEEDINGS OF THE 53RD ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2021, : 1132 - 1143
  • [35] Near-Optimal Random Walk Sampling in Distributed Networks
    Das Sarma, Atish
    Molla, Anisur Rahaman
    Pandurangan, Gopal
    2012 PROCEEDINGS IEEE INFOCOM, 2012, : 2906 - 2910
  • [36] Near-Optimal Graph Signal Sampling by Pareto Optimization
    Luo, Dongqi
    Si, Binqiang
    Zhang, Saite
    Yu, Fan
    Zhu, Jihong
    SENSORS, 2021, 21 (04) : 1 - 13
  • [37] Near-Optimal Dominating Sets via Random Sampling
    Nehez, Martin
    ALGORITHMIC ASPECTS IN INFORMATION AND MANAGEMENT, 2016, 9778 : 162 - 172
  • [38] Near-Optimal Entrywise Sampling of Numerically Sparse Matrices
    Braverman, Vladimir
    Krauthgamer, Robert
    Krishnan, Aditya
    Sapir, Shay
    CONFERENCE ON LEARNING THEORY, VOL 134, 2021, 134 : 759 - 773
  • [39] Near-Optimal Communication Lower Bounds for Approximate Nash Equilibria
    Goos, Mika
    Rubinstein, Aviad
    2018 IEEE 59TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2018, : 397 - 403
  • [40] NEAR-OPTIMAL COMMUNICATION LOWER BOUNDS FOR APPROXIMATE NASH EQUILIBRIA
    Goos, Mika
    Rubinstein, Aviad
    SIAM JOURNAL ON COMPUTING, 2023, 52 (06) : 316 - 348