Tight Regret Bounds for Infinite-armed Linear Contextual Bandits

被引:0
|
作者
Li, Yingkai [1 ]
Wang, Yining [2 ]
Chen, Xi [3 ]
Zhou, Yuan [4 ]
机构
[1] Northwestern Univ, Evanston, IL 60208 USA
[2] Univ Florida, Gainesville, FL USA
[3] NYU, New York, NY USA
[4] Univ Illinois, Urbana, IL USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Linear contextual bandit is an important class of sequential decision making problems with a wide range of applications to recommender systems, online advertising, healthcare, and many other machine learning related tasks. While there is a lot of prior research, tight regret bounds of linear contextual bandit with infinite action sets remain open. In this paper, we address this open problem by considering the linear contextual bandit with (changing) infinite action sets. We prove a regret upper bound on the order of O(root d(2)T log T) x poly(log log T) where d is the domain dimension and T is the time horizon. Our upper bound matches the previous lower bound of Omega(root d(2)T log T) in [Li et al., 2019] up to iterated logarithmic terms.
引用
收藏
页码:370 / 378
页数:9
相关论文
共 50 条
  • [1] Beyond the Best: Estimating Distribution Functionals in Infinite-Armed Bandits
    Wang, Yifei
    Baharav, Tavor Z.
    Han, Yanjun
    Jiao, Jiantao
    Tse, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [2] Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits
    Ito, Shinji
    Hirahara, Shuichi
    Soma, Tasuku
    Yoshida, Yuichi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [3] Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits
    Kveton, Branislav
    Wen, Zheng
    Ashkan, Azin
    Szepesvari, Csaba
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 535 - 543
  • [4] Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits
    Syrgkanis, Vasilis
    Luo, Haipeng
    Krishnamurthy, Akshay
    Schapire, Robert E.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [5] Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions
    Takemura, Kei
    Ito, Shinji
    Hatano, Daisuke
    Sumita, Hanna
    Fukunaga, Takuro
    Kakimura, Naonori
    Kawarabayashi, Ken-ichi
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9791 - 9798
  • [6] Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
    Ito, Shinji
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Regret Bounds for Batched Bandits
    Esfandiari, Hossein
    Karbasi, Amin
    Mehrabian, Abbas
    Mirrokni, Vahab
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7340 - 7348
  • [8] Context Enhancement for Linear Contextual Multi-Armed Bandits
    Gutowski, Nicolas
    Amghar, Tassadit
    Camp, Olivier
    Chhel, Fabien
    2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2018, : 1048 - 1055
  • [9] Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
    Tirinzoni, Andrea
    Papini, Matteo
    Touati, Ahmed
    Lazaric, Alessandro
    Pirotta, Matteo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [10] Neural Contextual Bandits without Regret
    Kassraie, Parnian
    Krause, Andreas
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 240 - 278