Pareto Optimal Model Selection in Linear Bandits

被引：0

作者：

Zhu, Yinglun ^{[1
]}

Nowak, Robert ^{[1
]}

机构：

[1] Univ Wisconsin, Madison, WI 53706 USA

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151 | 2022年 / 151卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study model selection in linear bandits, where the learner must adapt to the dimension (denoted by d(*)) of the smallest hypothesis class containing the true linear model while balancing exploration and exploitation. Previous papers provide various guarantees for this model selection problem, but have limitations; i.e., the analysis requires favorable conditions that allow for inexpensive statistical testing to locate the right hypothesis class or are based on the idea of "corralling" multiple base algorithms, which often performs relatively poorly in practice. These works also mainly focus on upper bounds. In this paper, we establish the first lower bound for the model selection problem. Our lower bound implies that, even with a fixed action set, adaptation to the unknown dimension d, comes at a cost: There is no algorithm that can achieve the regret bound (O) over tilde(root d*T) simultaneously for all values of d(*). We propose Pareto optimal algorithms that match the lower bound. Empirical evaluations show that our algorithm enjoys superior performance compared to existing ones.

引用

页数：21

共 50 条

[11] Pareto-Optimal Model Selection via SPRINT-Race
Zhang, Tiantian
Georgiopoulos, Michael
Anagnostopoulos, Georgios C.
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (02) : 596 - 610
[12] The Pareto Regret Frontier for Bandits
Lattimore, Tor
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[13] Model Selection for Generic Contextual Bandits
Ghosh, Avishek
Sankararaman, Abishek
Ramchandran, Kannan
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (01) : 656 - 675
[14] Near-Optimal Representation Learning for Linear Bandits and Linear RL
Hu, Jiachen
Chen, Xiaoyu
Jin, Chi
Li, Lihong
Wang, Liwei
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[15] Optimal Best-arm Identification in Linear Bandits
Jedra, Yassir
Proutiere, Alexandre
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[16] Provably Optimal Algorithms for Generalized Linear Contextual Bandits
Li, Lihong
Lu, Yu
Zhou, Dengyong
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[17] Dynamic Balancing for Model Selection in Bandits and RL
Cutkosky, Ashok
Dann, Christoph
Das, Abhimanyu
Gentile, Claudio
Pacchiano, Aldo
Purohit, Manish
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[18] Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design
Ruan, Yufei
Yang, Jiaqi
Zhou, Yuan
[J]. STOC '21: PROCEEDINGS OF THE 53RD ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2021, : 74 - 87
[19] Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
He, Jiafan
Zhou, Dongruo
Zhang, Tong
Gu, Quanquan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[20] Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost
Amani, Sanae
Lattimore, Tor
Gyorgy, Andras
Yang, Lin F.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202 : 691 - 717

← 1 2 3 4 5 →