Instance-optimal PAC Algorithms for Contextual Bandits

被引:0
|
作者
Li, Zhaoqi [1 ]
Ratliff, Lillian [2 ]
Nassif, Houssam [3 ]
Jamieson, Kevin [4 ]
Jain, Lalit [5 ]
机构
[1] Univ Washington, Dept Stat, Seattle, WA 98195 USA
[2] Univ Washington, Dept Elect & Comp Engn, Seattle, WA 98195 USA
[3] Amazon, Seattle, WA USA
[4] Univ Washington, Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA
[5] Univ Washington, Foster Sch Business, Seattle, WA 98195 USA
关键词
EXPLORATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied. In this work, we focus on the stochastic bandit problem in the (epsilon, delta)-PAC setting: given a policy class. the goal of the learner is to return a policy pi is an element of Pi. whose expected reward is within. of the optimal policy with probability greater than 1 - delta. We characterize the first instance-dependent PAC sample complexity of contextual bandits through a quantity rho(Pi.) and provide matching upper and lower bounds in terms of rho(Pi) for the agnostic and linear contextual best-arm identification settings. We show that no algorithm can be simultaneously minimax-optimal for regret minimization and instance-dependent PAC for best-arm identification. Our main result is a new instance-optimal and computationally efficient algorithm that relies on a polynomial number of calls to an argmax oracle.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Instance-Optimal Geometric Algorithms
    Afshani, Peyman
    Barbay, Jeremy
    Chan, Timothy M.
    JOURNAL OF THE ACM, 2017, 64 (01)
  • [2] Instance-Optimal Geometric Algorithms
    Afshani, Peyman
    Barbay, Jeremy
    Chan, Timothy M.
    2009 50TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE: FOCS 2009, PROCEEDINGS, 2009, : 129 - 138
  • [3] Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs
    Tirinzoni, Andrea
    Al-Marjani, Aymen
    Kaufmann, Emilie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [4] Optimal Algorithms for Stochastic Contextual Preference Bandits
    Saha, Aadirupa
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] Provably Optimal Algorithms for Generalized Linear Contextual Bandits
    Li, Lihong
    Lu, Yu
    Zhou, Dengyong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [6] Data Amplification: Instance-Optimal Property Estimation
    Hao, Yi
    Orlitsky, Alon
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [7] Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
    Saha, Aadirupa
    Krishnamurthy, Akshay
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167
  • [8] Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
    He, Jiafan
    Zhou, Dongruo
    Zhang, Tong
    Gu, Quanquan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [9] The SMART Approach to Instance-Optimal Online Learning
    Banerjee, Siddhartha
    Bhatt, Alankrita
    Yu, Christina Lee
    THIRTY SEVENTH ANNUAL CONFERENCE ON LEARNING THEORY, 2023, 247
  • [10] Computing Instance-Optimal Kernels in Two Dimensions
    Agarwal, Pankaj K.
    Har-Peled, Sariel
    DISCRETE & COMPUTATIONAL GEOMETRY, 2025, 73 (03) : 674 - 701