Instance-optimal PAC Algorithms for Contextual Bandits

被引:0
|
作者
Li, Zhaoqi [1 ]
Ratliff, Lillian [2 ]
Nassif, Houssam [3 ]
Jamieson, Kevin [4 ]
Jain, Lalit [5 ]
机构
[1] Univ Washington, Dept Stat, Seattle, WA 98195 USA
[2] Univ Washington, Dept Elect & Comp Engn, Seattle, WA 98195 USA
[3] Amazon, Seattle, WA USA
[4] Univ Washington, Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA
[5] Univ Washington, Foster Sch Business, Seattle, WA 98195 USA
关键词
EXPLORATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied. In this work, we focus on the stochastic bandit problem in the (epsilon, delta)-PAC setting: given a policy class. the goal of the learner is to return a policy pi is an element of Pi. whose expected reward is within. of the optimal policy with probability greater than 1 - delta. We characterize the first instance-dependent PAC sample complexity of contextual bandits through a quantity rho(Pi.) and provide matching upper and lower bounds in terms of rho(Pi) for the agnostic and linear contextual best-arm identification settings. We show that no algorithm can be simultaneously minimax-optimal for regret minimization and instance-dependent PAC for best-arm identification. Our main result is a new instance-optimal and computationally efficient algorithm that relies on a polynomial number of calls to an argmax oracle.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Robust instance-optimal recovery of sparse signals at unknown noise levels
    Petersen, Hendrik Bernd
    Jung, Peter
    INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2022, 11 (03) : 845 - 887
  • [22] Mostly Exploration-Free Algorithms for Contextual Bandits
    Bastani, Hamsa
    Bayati, Mohsen
    Khosravi, Khashayar
    MANAGEMENT SCIENCE, 2021, 67 (03) : 1329 - 1349
  • [23] Generalized Contextual Bandits With Latent Features: Algorithms and Applications
    Xu, Xiongxiao
    Xie, Hong
    Lui, John C. S.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4763 - 4775
  • [24] Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits
    Wu, Huasen
    Srikant, R.
    Liu, Xin
    Jiang, Chong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [25] Sublinear Optimal Policy Value Estimation in Contextual Bandits
    Kong, Weihao
    Valiant, Gregory
    Brunskill, Emma
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 4377 - 4386
  • [26] Contextual bandits with surrogate losses: Margin bounds and efficient algorithms
    Foster, Dylan J.
    Krishnamurthy, Akshay
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [27] Best-of-Both-Worlds Algorithms for Linear Contextual Bandits
    Kuroki, Yuko
    Rumi, Alberto
    Tsuchiya, Taira
    Vitale, Fabio
    Cesa-Bianchi, Nicolo
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [28] A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits
    Xie, Hong
    Tang, Qiao
    Zhu, Qingsheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 9887 - 9899
  • [29] Jointly Efficient and Optimal Algorithms for Logistic Bandits
    Faury, Louis
    Abeille, Marc
    Jun, Kwang-Sung
    Calauzenes, Clement
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 546 - 580
  • [30] Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost
    Amani, Sanae
    Lattimore, Tor
    Gyorgy, Andras
    Yang, Lin F.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202 : 691 - 717