Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

被引:0
|
作者
Jourdan, Marc [1 ]
Mutny, Mojmir [1 ]
Kirschner, Johannes [1 ]
Krause, Andreas [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
来源
基金
瑞士国家科学基金会; 欧洲研究理事会;
关键词
Combinatorial Bandits; Pure Exploration; Best-Arm Identification;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.
引用
收藏
页数:45
相关论文
共 50 条
  • [1] An Efficient Algorithm for Learning with Semi-bandit Feedback
    Neu, Gergely
    Bartok, Gabor
    [J]. ALGORITHMIC LEARNING THEORY (ALT 2013), 2013, 8139 : 234 - 248
  • [2] Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget
    Brandt, Jasmin
    Bengs, Viktor
    Haddenhorst, Bjoern
    Huellermeier, Eyke
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Combinatorial semi-bandit with known covariance
    Degenne, Remy
    Perchet, Vianney
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [4] Optimal Resource Allocation with Semi-Bandit Feedback
    Lattimore, Tor
    Crammer, Koby
    Szepesvari, Csaba
    [J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2014, : 477 - 486
  • [5] Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback
    Du, Yihan
    Kuroki, Yuko
    Chen, Wei
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7262 - 7270
  • [6] Combinatorial Pure Exploration for Dueling Bandits
    Chen, Wei
    Du, Yihan
    Huang, Longbo
    Zhao, Haoyu
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [7] Combinatorial Pure Exploration for Dueling Bandits
    Chen, Wei
    Du, Yihan
    Huang, Longbo
    Zhao, Haoyu
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [8] ONLINE LEARNING FOR COMPUTATION PEER OFFLOADING WITH SEMI-BANDIT FEEDBACK
    Zhu, Hongbin
    Kang, Kai
    Luo, Xiliang
    Qian, Hua
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 4524 - 4528
  • [9] Efficient task assignment for spatial crowdsourcing: A combinatorial fractional optimization approach with semi-bandit learning
    ul Hassan, Umair
    Curry, Edward
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 58 : 36 - 56
  • [10] Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback
    Letard, Alexandre
    Amghar, Tassadit
    Camp, Olivier
    Gutowski, Nicolas
    [J]. 2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 1073 - 1078