Near-Optimal Regret Bounds for Thompson Sampling

被引:57
|
作者
Agrawal, Shipra [1 ]
Goyal, Navin [1 ,2 ]
机构
[1] Microsoft Res, 9 Lavelle Rd, Bengaluru 560001, Karnataka, India
[2] Columbia Univ, Dept Ind Engn & Operat Res, 500 West 120th St,Mudd 423, New York, NY 10027 USA
关键词
Multi-armed bandits; PRINCIPLE;
D O I
10.1145/3088510
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Thompson Sampling (TS) is one of the oldest heuristics for multiarmed bandit problems. It is a randomized algorithm based on Bayesian ideas and has recently generated significant interest after several studies demonstrated that it has favorable empirical performance compared to the state-of-the-art methods. In this article, a novel and almost tight martingale-based regret analysis for Thompson Sampling is presented. Our technique simultaneously yields both problem-dependent and problem-independent bounds: (1) the first near-optimal problem-independent bound of O(root NT ln T) on the expected regret and (2) the optimal problem-dependent bound of (1 + epsilon) Sigma i ln T/d(mu(i), mu(1)) + O(N/epsilon(2)) on the expected regret (this bound was first proven by Kaufmann et al. (2012b)). Our technique is conceptually simple and easily extends to distributions other than the Beta distribution used in the original TS algorithm. For the version of TS that uses Gaussian priors, we prove a problem-independent bound of O(root NT ln N) on the expected regret and show the optimality of this bound by providing a matching lower bound. This is the first lower bound on the performance of a natural version of Thompson Sampling that is away from the general lower bound of Omega(root NT) for the multiarmed bandit problem.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] Near-Optimal Bounds for Online Caching with Machine Learned Advice
    Rohatgi, Dhruv
    PROCEEDINGS OF THE THIRTY-FIRST ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA'20), 2020, : 1834 - 1845
  • [42] NEAR-OPTIMAL SAMPLE COMPLEXITY BOUNDS FOR CIRCULANT BINARY EMBEDDING
    Oymak, Samet
    Thrampoulidis, Christos
    Hassibi, Babak
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 6359 - 6363
  • [43] Near-Optimal Bounds for Online Caching with Machine Learned Advice
    Rohatgi, Dhruv
    PROCEEDINGS OF THE 2020 ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2020, : 1834 - 1845
  • [44] Performance bounds analyzing on a near-optimal algorithm of Qm || Cmax
    Li Dong
    Yang Dan
    Deng Lin
    Wang Shi-long
    Proceedings of 2005 Chinese Control and Decision Conference, Vols 1 and 2, 2005, : 1812 - 1814
  • [45] Finding Near-Optimal Configurations in Product Lines by Random Sampling
    Oh, Jeho
    Batory, Don
    Myers, Margaret
    Siegmund, Norbert
    ESEC/FSE 2017: PROCEEDINGS OF THE 2017 11TH JOINT MEETING ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2017, : 61 - 71
  • [46] Sampling-based near-optimal MIMO demodulation algorithms
    Dong, B
    Wang, XD
    Doucet, A
    42ND IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-6, PROCEEDINGS, 2003, : 4214 - 4219
  • [47] Near-optimal Keypoint Sampling for Fast Pathological Lung Segmentation
    Mansoor, Awais
    Bagci, Ulas
    Mollura, Daniel J.
    2014 36TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2014, : 6032 - 6035
  • [48] Oblivious near-optimal sampling for multidimensional signals with Fourier constraints
    Xu, Xingyu
    Gu, Yuantao
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [49] Near-optimal time series sampling based on the reduced Hessian
    Chen, Weifeng
    Biegler, Lorenz T.
    AICHE JOURNAL, 2020, 66 (07)
  • [50] ON NEAR-OPTIMAL TIME SAMPLING FOR INITIAL DATA BEST APPROXIMATION
    Aceska, Roza
    Arsie, Alessandro
    Karki, Ramesh
    MATEMATICHE, 2019, 74 (01): : 173 - 190