Instance-optimal PAC Algorithms for Contextual Bandits

被引：0

作者：

Li, Zhaoqi ^{[1
]}

Ratliff, Lillian ^{[2
]}

Nassif, Houssam ^{[3
]}

Jamieson, Kevin ^{[4
]}

Jain, Lalit ^{[5
]}

机构：

[1] Univ Washington, Dept Stat, Seattle, WA 98195 USA

[2] Univ Washington, Dept Elect & Comp Engn, Seattle, WA 98195 USA

[3] Amazon, Seattle, WA USA

[4] Univ Washington, Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA

[5] Univ Washington, Foster Sch Business, Seattle, WA 98195 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

EXPLORATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied. In this work, we focus on the stochastic bandit problem in the (epsilon, delta)-PAC setting: given a policy class. the goal of the learner is to return a policy pi is an element of Pi. whose expected reward is within. of the optimal policy with probability greater than 1 - delta. We characterize the first instance-dependent PAC sample complexity of contextual bandits through a quantity rho(Pi.) and provide matching upper and lower bounds in terms of rho(Pi) for the agnostic and linear contextual best-arm identification settings. We show that no algorithm can be simultaneously minimax-optimal for regret minimization and instance-dependent PAC for best-arm identification. Our main result is a new instance-optimal and computationally efficient algorithm that relies on a polynomial number of calls to an argmax oracle.

引用

页数：14

共 50 条

[21] Robust instance-optimal recovery of sparse signals at unknown noise levels
Petersen, Hendrik Bernd
Jung, Peter
INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2022, 11 (03) : 845 - 887
[22] Mostly Exploration-Free Algorithms for Contextual Bandits
Bastani, Hamsa
Bayati, Mohsen
Khosravi, Khashayar
MANAGEMENT SCIENCE, 2021, 67 (03) : 1329 - 1349
[23] Generalized Contextual Bandits With Latent Features: Algorithms and Applications
Xu, Xiongxiao
Xie, Hong
Lui, John C. S.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4763 - 4775
[24] Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits
Wu, Huasen
Srikant, R.
Liu, Xin
Jiang, Chong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[25] Sublinear Optimal Policy Value Estimation in Contextual Bandits
Kong, Weihao
Valiant, Gregory
Brunskill, Emma
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 4377 - 4386
[26] Contextual bandits with surrogate losses: Margin bounds and efficient algorithms
Foster, Dylan J.
Krishnamurthy, Akshay
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[27] Best-of-Both-Worlds Algorithms for Linear Contextual Bandits
Kuroki, Yuko
Rumi, Alberto
Tsuchiya, Taira
Vitale, Fabio
Cesa-Bianchi, Nicolo
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[28] A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits
Xie, Hong
Tang, Qiao
Zhu, Qingsheng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 9887 - 9899
[29] Jointly Efficient and Optimal Algorithms for Logistic Bandits
Faury, Louis
Abeille, Marc
Jun, Kwang-Sung
Calauzenes, Clement
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 546 - 580
[30] Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost
Amani, Sanae
Lattimore, Tor
Gyorgy, Andras
Yang, Lin F.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202 : 691 - 717

← 1 2 3 4 5 →