Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

被引：0

作者：

Tirinzoni, Andrea ^{[1
]}

Papini, Matteo ^{[2
]}

Touati, Ahmed ^{[1
]}

Lazaric, Alessandro ^{[1
]}

Pirotta, Matteo ^{[1
]}

机构：

[1] Meta, Cambridge, MA 02140 USA

[2] Univ Pompeu Fabra, Barcelona, Spain

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

基金：

欧洲研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BANDITSRL, a representation learning algorithm that combines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BANDITSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BANDITSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.

引用

页数：13

共 50 条

[1] Nash Regret Guarantees for Linear Bandits
Sawarni, Ayush
Pal, Soumyabrata
Barman, Siddharth
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[2] Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection
Papini, Matteo
Tirinzoni, Andrea
Pacchiano, Aldo
Restilli, Marcello
Lazaric, Alessandro
Pirotta, Matteo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[3] Tight Regret Bounds for Infinite-armed Linear Contextual Bandits
Li, Yingkai
Wang, Yining
Chen, Xi
Zhou, Yuan
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 370 - 378
[4] On Learning Whittle Index Policy for Restless Bandits With Scalable Regret
Akbarzadeh, Nima
Mahajan, Aditya
IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2024, 11 (03): : 1190 - 1202
[5] Neural Contextual Bandits without Regret
Kassraie, Parnian
Krause, Andreas
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 240 - 278
[6] Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces
Zhu, Yinglun
Mineiro, Paul
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[7] Linear Bayes policy for learning in contextual-bandits
Antonio Martin H, Jose
Vargas, Ana M.
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (18) : 7400 - 7406
[8] Learning in Generalized Linear Contextual Bandits with Stochastic Delays
Zhou, Zhengyuan
Xu, Renyuan
Blanchet, Jose
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[9] Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits
Wu, Huasen
Srikant, R.
Liu, Xin
Jiang, Chong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[10] Offline Contextual Bandits with High Probability Fairness Guarantees
Metevier, Blossom
Giguere, Stephen
Brockman, Sarah
Kobren, Ari
Brun, Yuriy
Brunskill, Emma
Thomas, Philip S.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32

← 1 2 3 4 5 →