Instance-optimal PAC Algorithms for Contextual Bandits

被引:0
|
作者
Li, Zhaoqi [1 ]
Ratliff, Lillian [2 ]
Nassif, Houssam [3 ]
Jamieson, Kevin [4 ]
Jain, Lalit [5 ]
机构
[1] Univ Washington, Dept Stat, Seattle, WA 98195 USA
[2] Univ Washington, Dept Elect & Comp Engn, Seattle, WA 98195 USA
[3] Amazon, Seattle, WA USA
[4] Univ Washington, Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA
[5] Univ Washington, Foster Sch Business, Seattle, WA 98195 USA
关键词
EXPLORATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied. In this work, we focus on the stochastic bandit problem in the (epsilon, delta)-PAC setting: given a policy class. the goal of the learner is to return a policy pi is an element of Pi. whose expected reward is within. of the optimal policy with probability greater than 1 - delta. We characterize the first instance-dependent PAC sample complexity of contextual bandits through a quantity rho(Pi.) and provide matching upper and lower bounds in terms of rho(Pi) for the agnostic and linear contextual best-arm identification settings. We show that no algorithm can be simultaneously minimax-optimal for regret minimization and instance-dependent PAC for best-arm identification. Our main result is a new instance-optimal and computationally efficient algorithm that relies on a polynomial number of calls to an argmax oracle.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] STATISTICAL COMPLEXITY AND OPTIMAL ALGORITHMS FOR NONLINEAR RIDGE BANDITS
    Rajaraman, Nived
    Han, Yanjun
    Jiao, Jiantao
    Ramchandran, Kannan
    ANNALS OF STATISTICS, 2024, 52 (06): : 2557 - 2582
  • [42] Asymptotically optimal algorithms for budgeted multiple play bandits
    Alex Luedtke
    Emilie Kaufmann
    Antoine Chambaz
    Machine Learning, 2019, 108 : 1919 - 1949
  • [43] Contextual Blocking Bandits
    Basu, Soumya
    Papadigenopoulos, Orestis
    Caramanis, Constantine
    Shakkottai, Sanjay
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 271 - +
  • [44] Regularized Contextual Bandits
    Fontaine, Xavier
    Berthet, Quentin
    Perchet, Vianney
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [45] Semiparametric Contextual Bandits
    Krishnamurthy, Akshay
    Wu, Zhiwei Steven
    Syrgkanis, Vasilis
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [46] Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits
    Liu, Haolin
    Wei, Chen-Yu
    Zimmert, Julian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [47] Optimal cross-learning for contextual bandits with unknown context distributions
    Schneider, Jon
    Zimmert, Julian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] Optimal Multitask Linear Regression and Contextual Bandits under Sparse Heterogeneity
    Huang, Xinmeng
    Xu, Kan
    Lee, Donghwan
    Hassani, Hamed
    Bastani, Hamsa
    Dobriban, Edgar
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2025,
  • [49] Optimal Contextual Bandits with Knapsacks under Realizability via Regression Oracles
    Han, Yuxuan
    Zeng, Jialin
    Wang, Yang
    Xiang, Yang
    Zhang, Jiheng
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [50] R2T: Instance-optimal Truncation for Differentially Private Query Evaluation with Foreign Keys
    Dong, Wei
    Fang, Juanru
    Yi, Ke
    Tao, Yuchao
    Machanavajjhala, Ashwin
    SIGMOD RECORD, 2023, 52 (01) : 115 - 123