Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

被引:0
|
作者
Tirinzoni, Andrea [1 ]
Papini, Matteo [2 ]
Touati, Ahmed [1 ]
Lazaric, Alessandro [1 ]
Pirotta, Matteo [1 ]
机构
[1] Meta, Cambridge, MA 02140 USA
[2] Univ Pompeu Fabra, Barcelona, Spain
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BANDITSRL, a representation learning algorithm that combines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BANDITSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BANDITSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Nash Regret Guarantees for Linear Bandits
    Sawarni, Ayush
    Pal, Soumyabrata
    Barman, Siddharth
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection
    Papini, Matteo
    Tirinzoni, Andrea
    Pacchiano, Aldo
    Restilli, Marcello
    Lazaric, Alessandro
    Pirotta, Matteo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Tight Regret Bounds for Infinite-armed Linear Contextual Bandits
    Li, Yingkai
    Wang, Yining
    Chen, Xi
    Zhou, Yuan
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 370 - 378
  • [4] On Learning Whittle Index Policy for Restless Bandits With Scalable Regret
    Akbarzadeh, Nima
    Mahajan, Aditya
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2024, 11 (03): : 1190 - 1202
  • [5] Neural Contextual Bandits without Regret
    Kassraie, Parnian
    Krause, Andreas
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 240 - 278
  • [6] Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces
    Zhu, Yinglun
    Mineiro, Paul
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [7] Linear Bayes policy for learning in contextual-bandits
    Antonio Martin H, Jose
    Vargas, Ana M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (18) : 7400 - 7406
  • [8] Learning in Generalized Linear Contextual Bandits with Stochastic Delays
    Zhou, Zhengyuan
    Xu, Renyuan
    Blanchet, Jose
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [9] Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits
    Wu, Huasen
    Srikant, R.
    Liu, Xin
    Jiang, Chong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [10] Offline Contextual Bandits with High Probability Fairness Guarantees
    Metevier, Blossom
    Giguere, Stephen
    Brockman, Sarah
    Kobren, Ari
    Brun, Yuriy
    Brunskill, Emma
    Thomas, Philip S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32