Instance-optimal PAC Algorithms for Contextual Bandits

被引：0

作者：

Li, Zhaoqi ^{[1
]}

Ratliff, Lillian ^{[2
]}

Nassif, Houssam ^{[3
]}

Jamieson, Kevin ^{[4
]}

Jain, Lalit ^{[5
]}

机构：

[1] Univ Washington, Dept Stat, Seattle, WA 98195 USA

[2] Univ Washington, Dept Elect & Comp Engn, Seattle, WA 98195 USA

[3] Amazon, Seattle, WA USA

[4] Univ Washington, Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA

[5] Univ Washington, Foster Sch Business, Seattle, WA 98195 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

EXPLORATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied. In this work, we focus on the stochastic bandit problem in the (epsilon, delta)-PAC setting: given a policy class. the goal of the learner is to return a policy pi is an element of Pi. whose expected reward is within. of the optimal policy with probability greater than 1 - delta. We characterize the first instance-dependent PAC sample complexity of contextual bandits through a quantity rho(Pi.) and provide matching upper and lower bounds in terms of rho(Pi) for the agnostic and linear contextual best-arm identification settings. We show that no algorithm can be simultaneously minimax-optimal for regret minimization and instance-dependent PAC for best-arm identification. Our main result is a new instance-optimal and computationally efficient algorithm that relies on a polynomial number of calls to an argmax oracle.

引用

页数：14

共 50 条

[31] Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
Wang, Yu-Xiang
Agarwal, Alekh
Dudik, Miroslav
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[32] Optimal Baseline Corrections for Off-Policy Contextual Bandits
Gupta, Shashank
Jeunen, Olivier
Oosterhuis, Harrie
de Rijke, Maarten
PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, : 722 - 732
[33] Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
Foster, Dylan J.
Rakhlin, Alexander
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[34] Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
Foster, Dylan J.
Rakhlin, Alexander
25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[35] Near Instance Optimal Model Selection for Pure Exploration Linear Bandits
Zhu, Yinglun
Katz-Samuels, Julian
Nowak, Robert
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[36] Asymptotically optimal algorithms for budgeted multiple play bandits
Luedtke, Alex
Kaufmann, Emilie
Chambaz, Antoine
MACHINE LEARNING, 2019, 108 (11) : 1919 - 1949
[37] Optimal Algorithms for Multiplayer Multi-Armed Bandits
Wang, Po-An
Proutiere, Alexandre
Ariu, Kaito
Jedra, Yassir
Russo, Alessio
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[38] Optimal Streaming Algorithms for Multi-Armed Bandits
Jin, Tianyuan
Huang, Keke
Tang, Jing
Xiao, Xiaokui
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[39] Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
Combes, Richard
Proutiere, Alexandre
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
[40] Breaking the √T Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits
Ghosh, Avishek
Sankararaman, Abishek
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,

← 1 2 3 4 5 →