Instance-optimal PAC Algorithms for Contextual Bandits

被引:0
|
作者
Li, Zhaoqi [1 ]
Ratliff, Lillian [2 ]
Nassif, Houssam [3 ]
Jamieson, Kevin [4 ]
Jain, Lalit [5 ]
机构
[1] Univ Washington, Dept Stat, Seattle, WA 98195 USA
[2] Univ Washington, Dept Elect & Comp Engn, Seattle, WA 98195 USA
[3] Amazon, Seattle, WA USA
[4] Univ Washington, Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA
[5] Univ Washington, Foster Sch Business, Seattle, WA 98195 USA
关键词
EXPLORATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied. In this work, we focus on the stochastic bandit problem in the (epsilon, delta)-PAC setting: given a policy class. the goal of the learner is to return a policy pi is an element of Pi. whose expected reward is within. of the optimal policy with probability greater than 1 - delta. We characterize the first instance-dependent PAC sample complexity of contextual bandits through a quantity rho(Pi.) and provide matching upper and lower bounds in terms of rho(Pi) for the agnostic and linear contextual best-arm identification settings. We show that no algorithm can be simultaneously minimax-optimal for regret minimization and instance-dependent PAC for best-arm identification. Our main result is a new instance-optimal and computationally efficient algorithm that relies on a polynomial number of calls to an argmax oracle.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
    Wang, Yu-Xiang
    Agarwal, Alekh
    Dudik, Miroslav
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [32] Optimal Baseline Corrections for Off-Policy Contextual Bandits
    Gupta, Shashank
    Jeunen, Olivier
    Oosterhuis, Harrie
    de Rijke, Maarten
    PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, : 722 - 732
  • [33] Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
    Foster, Dylan J.
    Rakhlin, Alexander
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [34] Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
    Foster, Dylan J.
    Rakhlin, Alexander
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [35] Near Instance Optimal Model Selection for Pure Exploration Linear Bandits
    Zhu, Yinglun
    Katz-Samuels, Julian
    Nowak, Robert
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [36] Asymptotically optimal algorithms for budgeted multiple play bandits
    Luedtke, Alex
    Kaufmann, Emilie
    Chambaz, Antoine
    MACHINE LEARNING, 2019, 108 (11) : 1919 - 1949
  • [37] Optimal Algorithms for Multiplayer Multi-Armed Bandits
    Wang, Po-An
    Proutiere, Alexandre
    Ariu, Kaito
    Jedra, Yassir
    Russo, Alessio
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [38] Optimal Streaming Algorithms for Multi-Armed Bandits
    Jin, Tianyuan
    Huang, Keke
    Tang, Jing
    Xiao, Xiaokui
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [39] Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
    Combes, Richard
    Proutiere, Alexandre
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [40] Breaking the √T Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits
    Ghosh, Avishek
    Sankararaman, Abishek
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,