Combinatorial Bandits with Relative Feedback

被引:0
|
作者
Saha, Aadirupa [1 ]
Gopalan, Aditya [1 ]
机构
[1] Indian Inst Sci, Bangalore, Karnataka, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute. Specifically, we study two regret minimisation problems over subsets of a finite ground set [n], with subset-wise relative preference information feedback according to the Multinomial logit choice model. In the first setting, the learner can play subsets of size bounded by a maximum size and receives top-m rank-ordered feedback, while in the second setting the learner can play subsets of a fixed size k with a full subset ranking observed as feedback. For both settings, we devise instance-dependent and order-optimal regret algorithms with regret O(n/m ln T) and O(n/k ln T), respectively. We derive fundamental limits on the regret performance of online learning with subset-wise preferences, proving the tightness of our regret guarantees. Our results also show the value of eliciting more general top-m rank-ordered feedback over single winner feedback (m = 1). Our theoretical results are corroborated with empirical evaluations.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs
    van der Hoeven, Dirk
    Zierahn, Lukas
    Lancewicki, Tal
    Rosenberg, Aviv
    Cesa-Bianchi, Nicolo
    [J]. THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
  • [2] Combinatorial bandits
    Cesa-Bianchi, Nicolo
    Lugosi, Gabor
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2012, 78 (05) : 1404 - 1422
  • [3] Top-k Combinatorial Bandits with Full-Bandit Feedback
    Rejwan, Idan
    Mansour, Yishay
    [J]. ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 752 - 776
  • [4] Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback
    Jourdan, Marc
    Mutny, Mojmir
    Kirschner, Johannes
    Krause, Andreas
    [J]. ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
  • [5] Combinatorial Cascading Bandits
    Kveton, Branislav
    Wen, Zheng
    Ashkan, Azin
    Szepesvari, Csaba
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [6] Combinatorial Bandits Revisited
    Combes, Richard
    Talebi, M. Sadegh
    Proutiere, Alexandre
    Lelarge, Marc
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [7] Combinatorial Causal Bandits
    Feng, Shi
    Chen, Wei
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7550 - 7558
  • [8] Contextual Combinatorial Cascading Bandits
    Li, Shuai
    Wang, Baoxiang
    Zhang, Shengyu
    Chen, Wei
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [9] Combinatorial Pure Exploration for Dueling Bandits
    Chen, Wei
    Du, Yihan
    Huang, Longbo
    Zhao, Haoyu
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [10] Combinatorial Bandits under Strategic Manipulations
    Dong, Jing
    Li, Ke
    Li, Shuai
    Wang, Baoxiang
    [J]. WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 219 - 229