Near-optimal Per-Action Regret Bounds for Sleeping Bandits

被引:0
|
作者
Quan Nguyen [1 ]
Mehta, Nishant A. [1 ]
机构
[1] Univ Victoria, Dept Comp Sci, Victoria, BC, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We derive near-optimal per-action regret bounds for sleeping bandits, in which both the sets of available arms and their losses in every round are chosen by an adversary. In a setting with K total arms and at most A available arms in each round over T rounds, the best known upper bound is O(K TA ln K), obtained indirectly via minimizing internal sleeping regrets. Compared to the minimax O( TA) lower bound, this upper bound contains an extra multiplicative factor of K ln K. We address this gap by directly minimizing the per-action regret using generalized versions of EXP3, EXP3-IX and FTRL with Tsallis entropy, thereby obtaining near-optimal bounds of order O(v TA ln K) and O( T v AK). We extend our results to the setting of bandits with advice from sleeping experts, generalizing EXP4 along the way. This leads to new proofs for a number of existing adaptive and tracking regret bounds for standard non-sleeping bandits. Extending our results to the bandit version of experts that report their confidences leads to new bounds for the confidence regret that depends primarily on the sum of experts' confidences. We prove a lower bound, showing that for any minimax optimal algorithms, there exists an action whose regret is sublinear in T but linear in the number of its active rounds.
引用
收藏
页数:36
相关论文
共 50 条
  • [31] Near-Optimal Complexity Bounds for Fragments of the Skolem Problem
    Akshay, S.
    Balaji, Nikhil
    Murhekar, Aniket
    Varma, Rohith
    Vyas, Nikhil
    37TH INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2020), 2020, 154
  • [32] Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization
    Maiti, Arnab
    Patil, Vishakha
    Khan, Arindam
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [33] Near-Optimal No-Regret Algorithms for Zero-Sum Games
    Daskalakis, Constantinos
    Deckelbaum, Alan
    Kim, Anthony
    PROCEEDINGS OF THE TWENTY-SECOND ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2011, : 235 - 254
  • [34] Near-optimal discrete optimization for experimental design: a regret minimization approach
    Allen-Zhu, Zeyuan
    Li, Yuanzhi
    Singh, Aarti
    Wang, Yining
    MATHEMATICAL PROGRAMMING, 2021, 186 (1-2) : 439 - 478
  • [35] Near-optimal discrete optimization for experimental design: a regret minimization approach
    Zeyuan Allen-Zhu
    Yuanzhi Li
    Aarti Singh
    Yining Wang
    Mathematical Programming, 2021, 186 : 439 - 478
  • [36] Near-optimal no-regret algorithms for zero-sum games
    Daskalakis, Constantinos
    Deckelbaum, Alan
    Kim, Anthony
    GAMES AND ECONOMIC BEHAVIOR, 2015, 92 : 327 - 348
  • [37] Near-Optimal No-Regret Learning Dynamics for General Convex Games
    Farina, Gabriele
    Anagnostides, Ioannis
    Luo, Haipeng
    Lee, Chung-Wei
    Kroer, Christian
    Sandholm, Tuomas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [38] A NEAR-OPTIMAL METHOD FOR REASONING ABOUT ACTION
    PRATT, VR
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1980, 20 (02) : 231 - 254
  • [39] Near-Optimal Communication Lower Bounds for Approximate Nash Equilibria
    Goos, Mika
    Rubinstein, Aviad
    2018 IEEE 59TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2018, : 397 - 403
  • [40] NEAR-OPTIMAL COMMUNICATION LOWER BOUNDS FOR APPROXIMATE NASH EQUILIBRIA
    Goos, Mika
    Rubinstein, Aviad
    SIAM JOURNAL ON COMPUTING, 2023, 52 (06) : 316 - 348