On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems

被引:0
|
作者
Kim, Baekjin [1 ]
Tewari, Ambuj [1 ]
机构
[1] Univ Michigan, Dept Stat, Ann Arbor, MI 48109 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the optimality of perturbation based algorithms in the stochastic and adversarial multi-armed bandit problems. For the stochastic case, we provide a unified regret analysis for both sub-Weibull and bounded perturbations when rewards are sub-Gaussian. Our bounds are instance optimal for sub-Weibull perturbations with parameter 2 that also have a matching lower tail bound, and all bounded support perturbations where there is sufficient probability mass at the extremes of the support. For the adversarial setting, we prove rigorous barriers against two natural solution approaches using tools from discrete choice theory and extreme value theory. Our results suggest that the optimal perturbation, if it exists, will be of Frechet-type.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Adversarial multi-armed bandit approach to stochastic optimization
    Chang, Hyeong Soo
    Fu, Michael C.
    Marcus, Steven I.
    PROCEEDINGS OF THE 45TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2006, : 5684 - +
  • [2] Synchronization and optimality for multi-armed bandit problems in continuous time
    ElKaroui, N
    Karatzas, I
    COMPUTATIONAL & APPLIED MATHEMATICS, 1997, 16 (02): : 117 - 151
  • [3] Mechanisms with learning for stochastic multi-armed bandit problems
    Shweta Jain
    Satyanath Bhat
    Ganesh Ghalme
    Divya Padmanabhan
    Y. Narahari
    Indian Journal of Pure and Applied Mathematics, 2016, 47 : 229 - 272
  • [4] MECHANISMS WITH LEARNING FOR STOCHASTIC MULTI-ARMED BANDIT PROBLEMS
    Jain, Shweta
    Bhat, Satyanath
    Ghalme, Ganesh
    Padmanabhan, Divya
    Narahari, Y.
    INDIAN JOURNAL OF PURE & APPLIED MATHEMATICS, 2016, 47 (02): : 229 - 272
  • [5] The Multi-Armed Bandit With Stochastic Plays
    Lesage-Landry, Antoine
    Taylor, Joshua A.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (07) : 2280 - 2286
  • [6] Time-Varying Stochastic Multi-Armed Bandit Problems
    Vakili, Sattar
    Zhao, Qing
    Zhou, Yuan
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 2103 - 2107
  • [7] Achieving Privacy in the Adversarial Multi-Armed Bandit
    Tossou, Aristide C. Y.
    Dimitrakakis, Christos
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2653 - 2659
  • [8] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
    Bubeck, Sebastien
    Cesa-Bianchi, Nicolo
    FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2012, 5 (01): : 1 - 122
  • [9] Bridging Adversarial and Nonstationary Multi-Armed Bandit
    Chen, Ningyuan
    Yang, Shuoguang
    Zhang, Hailun
    PRODUCTION AND OPERATIONS MANAGEMENT, 2025,
  • [10] Satisficing in Multi-Armed Bandit Problems
    Reverdy, Paul
    Srivastava, Vaibhav
    Leonard, Naomi Ehrich
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) : 3788 - 3803