Combinatorial Bandits with Relative Feedback

被引：0

作者：

Saha, Aadirupa ^{[1
]}

Gopalan, Aditya ^{[1
]}

机构：

[1] Indian Inst Sci, Bangalore, Karnataka, India

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute. Specifically, we study two regret minimisation problems over subsets of a finite ground set [n], with subset-wise relative preference information feedback according to the Multinomial logit choice model. In the first setting, the learner can play subsets of size bounded by a maximum size and receives top-m rank-ordered feedback, while in the second setting the learner can play subsets of a fixed size k with a full subset ranking observed as feedback. For both settings, we devise instance-dependent and order-optimal regret algorithms with regret O(n/m ln T) and O(n/k ln T), respectively. We derive fundamental limits on the regret performance of online learning with subset-wise preferences, proving the tightness of our regret guarantees. Our results also show the value of eliciting more general top-m rank-ordered feedback over single winner feedback (m = 1). Our theoretical results are corroborated with empirical evaluations.

引用

页数：11

共 50 条

[1] A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs
van der Hoeven, Dirk
Zierahn, Lukas
Lancewicki, Tal
Rosenberg, Aviv
Cesa-Bianchi, Nicolo
[J]. THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
[2] Combinatorial bandits
Cesa-Bianchi, Nicolo
Lugosi, Gabor
[J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2012, 78 (05) : 1404 - 1422
[3] Top-k Combinatorial Bandits with Full-Bandit Feedback
Rejwan, Idan
Mansour, Yishay
[J]. ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 752 - 776
[4] Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback
Jourdan, Marc
Mutny, Mojmir
Kirschner, Johannes
Krause, Andreas
[J]. ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
[5] Combinatorial Cascading Bandits
Kveton, Branislav
Wen, Zheng
Ashkan, Azin
Szepesvari, Csaba
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[6] Combinatorial Bandits Revisited
Combes, Richard
Talebi, M. Sadegh
Proutiere, Alexandre
Lelarge, Marc
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[7] Combinatorial Causal Bandits
Feng, Shi
Chen, Wei
[J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7550 - 7558
[8] Contextual Combinatorial Cascading Bandits
Li, Shuai
Wang, Baoxiang
Zhang, Shengyu
Chen, Wei
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[9] Combinatorial Pure Exploration for Dueling Bandits
Chen, Wei
Du, Yihan
Huang, Longbo
Zhao, Haoyu
[J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[10] Combinatorial Bandits under Strategic Manipulations
Dong, Jing
Li, Ke
Li, Shuai
Wang, Baoxiang
[J]. WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 219 - 229

← 1 2 3 4 5 →