Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

被引:0
|
作者
Tirinzoni, Andrea [1 ]
Papini, Matteo [2 ]
Touati, Ahmed [1 ]
Lazaric, Alessandro [1 ]
Pirotta, Matteo [1 ]
机构
[1] Meta, Cambridge, MA 02140 USA
[2] Univ Pompeu Fabra, Barcelona, Spain
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BANDITSRL, a representation learning algorithm that combines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BANDITSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BANDITSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP
    Levy, Orin
    Mansour, Yishay
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8510 - 8517
  • [32] Multi-task Representation Learning with Stochastic Linear Bandits
    Cella, Leonardo
    Lounici, Karim
    Pacreau, Gregoire
    Pontil, Massimiliano
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [33] Non-Stationary Representation Learning in Sequential Linear Bandits
    Qin, Yuzhen
    Menara, Tommaso
    Oymak, Samet
    Ching, Shinung
    Pasqualetti, Fabio
    IEEE OPEN JOURNAL OF CONTROL SYSTEMS, 2022, 1 : 41 - 56
  • [34] Online Linear Quadratic Tracking With Regret Guarantees
    Karapetyan, Aren
    Bolliger, Diego
    Tsiamis, Anastasios
    Balta, Efe C.
    Lygeros, John
    IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 3950 - 3955
  • [35] Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits
    Luo, Haipeng
    Zhang, Mengxiao
    Zhao, Peng
    Zhou, Zhi-Hua
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [36] Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
    Ito, Shinji
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [37] Optimal Regret Bounds for Collaborative Learning in Bandits
    Shidani, Amitis
    Vakili, Sattar
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
  • [38] Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits
    Syrgkanis, Vasilis
    Luo, Haipeng
    Krishnamurthy, Akshay
    Schapire, Robert E.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [39] Constant or Logarithmic Regret in Asynchronous Multiplayer Bandits with Limited Communication
    Richard, Hugo
    Boursier, Etienne
    Perchet, Vianney
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [40] Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes
    Sharma, Nihal
    Sen, Rajat
    Basu, Soumya
    Shanmugam, Karthikeyan
    Shakkottai, Sanjay
    ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, 2024, 9 (03)