Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

被引：0

作者：

Tirinzoni, Andrea ^{[1
]}

Papini, Matteo ^{[2
]}

Touati, Ahmed ^{[1
]}

Lazaric, Alessandro ^{[1
]}

Pirotta, Matteo ^{[1
]}

机构：

[1] Meta, Cambridge, MA 02140 USA

[2] Univ Pompeu Fabra, Barcelona, Spain

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

基金：

欧洲研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BANDITSRL, a representation learning algorithm that combines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BANDITSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BANDITSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.

引用

页数：13

共 50 条

[31] Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP
Levy, Orin
Mansour, Yishay
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8510 - 8517
[32] Multi-task Representation Learning with Stochastic Linear Bandits
Cella, Leonardo
Lounici, Karim
Pacreau, Gregoire
Pontil, Massimiliano
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[33] Non-Stationary Representation Learning in Sequential Linear Bandits
Qin, Yuzhen
Menara, Tommaso
Oymak, Samet
Ching, Shinung
Pasqualetti, Fabio
IEEE OPEN JOURNAL OF CONTROL SYSTEMS, 2022, 1 : 41 - 56
[34] Online Linear Quadratic Tracking With Regret Guarantees
Karapetyan, Aren
Bolliger, Diego
Tsiamis, Anastasios
Balta, Efe C.
Lygeros, John
IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 3950 - 3955
[35] Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits
Luo, Haipeng
Zhang, Mengxiao
Zhao, Peng
Zhou, Zhi-Hua
CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
[36] Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
Ito, Shinji
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[37] Optimal Regret Bounds for Collaborative Learning in Bandits
Shidani, Amitis
Vakili, Sattar
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
[38] Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits
Syrgkanis, Vasilis
Luo, Haipeng
Krishnamurthy, Akshay
Schapire, Robert E.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[39] Constant or Logarithmic Regret in Asynchronous Multiplayer Bandits with Limited Communication
Richard, Hugo
Boursier, Etienne
Perchet, Vianney
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[40] Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes
Sharma, Nihal
Sen, Rajat
Basu, Soumya
Shanmugam, Karthikeyan
Shakkottai, Sanjay
ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, 2024, 9 (03)

← 1 2 3 4 5 →