Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

被引:0
|
作者
Tirinzoni, Andrea [1 ]
Papini, Matteo [2 ]
Touati, Ahmed [1 ]
Lazaric, Alessandro [1 ]
Pirotta, Matteo [1 ]
机构
[1] Meta, Cambridge, MA 02140 USA
[2] Univ Pompeu Fabra, Barcelona, Spain
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BANDITSRL, a representation learning algorithm that combines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BANDITSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BANDITSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Near-Optimal Representation Learning for Linear Bandits and Linear RL
    Hu, Jiachen
    Chen, Xiaoyu
    Jin, Chi
    Li, Lihong
    Wang, Liwei
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [22] Experimental Design for Regret Minimization in Linear Bandits
    Wagenmaker, Andrew
    Katz-Samuels, Julian
    Jamieson, Kevin
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [23] Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions
    Takemura, Kei
    Ito, Shinji
    Hatano, Daisuke
    Sumita, Hanna
    Fukunaga, Takuro
    Kakimura, Naonori
    Kawarabayashi, Ken-ichi
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9791 - 9798
  • [24] No-Regret Linear Bandits beyond Realizability
    Liu, Chong
    Yin, Ming
    Wang, Yu-Xiang
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 1294 - 1303
  • [25] Batched Learning in Generalized Linear Contextual Bandits With General Decision Sets
    Ren, Zhimei
    Zhou, Zhengyuan
    Kalagnanam, Jayant R.
    IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 37 - 42
  • [26] Stochastic Conservative Contextual Linear Bandits
    Lin, Jiabin
    Lee, Xian Yeow
    Jubery, Talukder
    Moothedath, Shana
    Sarkar, Soumik
    Ganapathysubramanian, Baskar
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 7321 - 7326
  • [27] Adversarial Attacks on Linear Contextual Bandits
    Garcelon, Evrard
    Roziere, Baptiste
    Meunier, Laurent
    Tarbouriech, Jean
    Teytaud, Olivier
    Lazaric, Alessandro
    Pirotta, Matteo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), 2020, 33
  • [28] Batched Learning in Generalized Linear Contextual Bandits with General Decision Sets
    Ren, Zhimei
    Zhou, Zhengyuan
    Kalagnanam, Jayant R.
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021,
  • [29] Shuffle Private Linear Contextual Bandits
    Chowdhury, Sayak Ray
    Zhou, Xingyu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [30] Differentially Private Contextual Linear Bandits
    Shariff, Roshan
    Sheffet, Or
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31