Instance-optimal PAC Algorithms for Contextual Bandits

被引：0

作者：

Li, Zhaoqi ^{[1
]}

Ratliff, Lillian ^{[2
]}

Nassif, Houssam ^{[3
]}

Jamieson, Kevin ^{[4
]}

Jain, Lalit ^{[5
]}

机构：

[1] Univ Washington, Dept Stat, Seattle, WA 98195 USA

[2] Univ Washington, Dept Elect & Comp Engn, Seattle, WA 98195 USA

[3] Amazon, Seattle, WA USA

[4] Univ Washington, Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA

[5] Univ Washington, Foster Sch Business, Seattle, WA 98195 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

EXPLORATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied. In this work, we focus on the stochastic bandit problem in the (epsilon, delta)-PAC setting: given a policy class. the goal of the learner is to return a policy pi is an element of Pi. whose expected reward is within. of the optimal policy with probability greater than 1 - delta. We characterize the first instance-dependent PAC sample complexity of contextual bandits through a quantity rho(Pi.) and provide matching upper and lower bounds in terms of rho(Pi) for the agnostic and linear contextual best-arm identification settings. We show that no algorithm can be simultaneously minimax-optimal for regret minimization and instance-dependent PAC for best-arm identification. Our main result is a new instance-optimal and computationally efficient algorithm that relies on a polynomial number of calls to an argmax oracle.

引用

页数：14

共 50 条

[1] Instance-Optimal Geometric Algorithms
Afshani, Peyman
Barbay, Jeremy
Chan, Timothy M.
JOURNAL OF THE ACM, 2017, 64 (01)
[2] Instance-Optimal Geometric Algorithms
Afshani, Peyman
Barbay, Jeremy
Chan, Timothy M.
2009 50TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE: FOCS 2009, PROCEEDINGS, 2009, : 129 - 138
[3] Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs
Tirinzoni, Andrea
Al-Marjani, Aymen
Kaufmann, Emilie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[4] Optimal Algorithms for Stochastic Contextual Preference Bandits
Saha, Aadirupa
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[5] Provably Optimal Algorithms for Generalized Linear Contextual Bandits
Li, Lihong
Lu, Yu
Zhou, Dengyong
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[6] Data Amplification: Instance-Optimal Property Estimation
Hao, Yi
Orlitsky, Alon
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[7] Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
Saha, Aadirupa
Krishnamurthy, Akshay
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167
[8] Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
He, Jiafan
Zhou, Dongruo
Zhang, Tong
Gu, Quanquan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[9] The SMART Approach to Instance-Optimal Online Learning
Banerjee, Siddhartha
Bhatt, Alankrita
Yu, Christina Lee
THIRTY SEVENTH ANNUAL CONFERENCE ON LEARNING THEORY, 2023, 247
[10] Computing Instance-Optimal Kernels in Two Dimensions
Agarwal, Pankaj K.
Har-Peled, Sariel
DISCRETE & COMPUTATIONAL GEOMETRY, 2025, 73 (03) : 674 - 701

← 1 2 3 4 5 →