Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

被引：0

作者：

Tirinzoni, Andrea ^{[1
]}

Papini, Matteo ^{[2
]}

Touati, Ahmed ^{[1
]}

Lazaric, Alessandro ^{[1
]}

Pirotta, Matteo ^{[1
]}

机构：

[1] Meta, Cambridge, MA 02140 USA

[2] Univ Pompeu Fabra, Barcelona, Spain

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

基金：

欧洲研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BANDITSRL, a representation learning algorithm that combines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BANDITSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BANDITSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.

引用

页数：13

共 50 条

[21] Near-Optimal Representation Learning for Linear Bandits and Linear RL
Hu, Jiachen
Chen, Xiaoyu
Jin, Chi
Li, Lihong
Wang, Liwei
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[22] Experimental Design for Regret Minimization in Linear Bandits
Wagenmaker, Andrew
Katz-Samuels, Julian
Jamieson, Kevin
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[23] Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions
Takemura, Kei
Ito, Shinji
Hatano, Daisuke
Sumita, Hanna
Fukunaga, Takuro
Kakimura, Naonori
Kawarabayashi, Ken-ichi
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9791 - 9798
[24] No-Regret Linear Bandits beyond Realizability
Liu, Chong
Yin, Ming
Wang, Yu-Xiang
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 1294 - 1303
[25] Batched Learning in Generalized Linear Contextual Bandits With General Decision Sets
Ren, Zhimei
Zhou, Zhengyuan
Kalagnanam, Jayant R.
IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 37 - 42
[26] Stochastic Conservative Contextual Linear Bandits
Lin, Jiabin
Lee, Xian Yeow
Jubery, Talukder
Moothedath, Shana
Sarkar, Soumik
Ganapathysubramanian, Baskar
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 7321 - 7326
[27] Adversarial Attacks on Linear Contextual Bandits
Garcelon, Evrard
Roziere, Baptiste
Meunier, Laurent
Tarbouriech, Jean
Teytaud, Olivier
Lazaric, Alessandro
Pirotta, Matteo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), 2020, 33
[28] Batched Learning in Generalized Linear Contextual Bandits with General Decision Sets
Ren, Zhimei
Zhou, Zhengyuan
Kalagnanam, Jayant R.
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021,
[29] Shuffle Private Linear Contextual Bandits
Chowdhury, Sayak Ray
Zhou, Xingyu
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[30] Differentially Private Contextual Linear Bandits
Shariff, Roshan
Sheffet, Or
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31

← 1 2 3 4 5 →