Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

被引：0

作者：

Jourdan, Marc ^{[1
]}

Mutny, Mojmir ^{[1
]}

Kirschner, Johannes ^{[1
]}

Krause, Andreas ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland

来源：

ALGORITHMIC LEARNING THEORY, VOL 132 | 2021年 / 132卷

基金：

瑞士国家科学基金会; 欧洲研究理事会;

关键词：

Combinatorial Bandits; Pure Exploration; Best-Arm Identification;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.

引用

页数：45

共 50 条

[1] An Efficient Algorithm for Learning with Semi-bandit Feedback
Neu, Gergely
Bartok, Gabor
[J]. ALGORITHMIC LEARNING THEORY (ALT 2013), 2013, 8139 : 234 - 248
[2] Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget
Brandt, Jasmin
Bengs, Viktor
Haddenhorst, Bjoern
Huellermeier, Eyke
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[3] Combinatorial semi-bandit with known covariance
Degenne, Remy
Perchet, Vianney
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[4] Optimal Resource Allocation with Semi-Bandit Feedback
Lattimore, Tor
Crammer, Koby
Szepesvari, Csaba
[J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2014, : 477 - 486
[5] Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback
Du, Yihan
Kuroki, Yuko
Chen, Wei
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7262 - 7270
[6] Combinatorial Pure Exploration for Dueling Bandits
Chen, Wei
Du, Yihan
Huang, Longbo
Zhao, Haoyu
[J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[7] Combinatorial Pure Exploration for Dueling Bandits
Chen, Wei
Du, Yihan
Huang, Longbo
Zhao, Haoyu
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[8] ONLINE LEARNING FOR COMPUTATION PEER OFFLOADING WITH SEMI-BANDIT FEEDBACK
Zhu, Hongbin
Kang, Kai
Luo, Xiliang
Qian, Hua
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 4524 - 4528
[9] Efficient task assignment for spatial crowdsourcing: A combinatorial fractional optimization approach with semi-bandit learning
ul Hassan, Umair
Curry, Edward
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 58 : 36 - 56
[10] Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback
Letard, Alexandre
Amghar, Tassadit
Camp, Olivier
Gutowski, Nicolas
[J]. 2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 1073 - 1078

← 1 2 3 4 5 →