Combinatorial bandits

被引：208

作者：

Cesa-Bianchi, Nicolo ^{[1
]}

Lugosi, Gabor ^{[2
,3
]}

机构：

[1] Univ Milan, I-20122 Milan, Italy

[2] ICREA, Barcelona, Spain

[3] Pompeu Fabra Univ, Barcelona, Spain

来源：

JOURNAL OF COMPUTER AND SYSTEM SCIENCES | 2012年 / 78卷 / 05期

关键词：

Online prediction; Adversarial bandit problems; Online linear optimization; ALGORITHMS;

D O I：

10.1016/j.jcss.2012.01.001

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study sequential prediction problems in which, at each time instance, the forecaster chooses a vector from a given finite set S subset of R-d. At the same time, the opponent chooses a "loss" vector in R-d and the forecaster suffers a loss that is the inner product of the two vectors. The goal of the forecaster is to achieve that, in the long run, the accumulated loss is not much larger than that of the best possible element in S. We consider the "bandit" setting in which the forecaster only has access to the losses of the chosen vectors (i.e., the entire loss vectors are not observed). We introduce a variant of a strategy by Dani, Hayes and Kakade achieving a regret bound that, for a variety of concrete choices of S, is of order root nd ln vertical bar S vertical bar where n is the time horizon. This is not improvable in general and is better than previously known bounds. The examples we consider are all such that S subset of {0. 1}(d), and we show how the combinatorial structure of these classes can be exploited to improve the regret bounds. We also point out computationally efficient implementations for various interesting choices of S. (C) 2012 Elsevier Inc. All rights reserved.

引用

页码：1404 / 1422

页数：19

共 50 条

[41] Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints
Xu, Huanle
Liu, Yang
Lau, Wing Cheong
Li, Rui
[J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2554 - 2560
[42] Adversarial Combinatorial Bandits with General Non-linear Reward Functions
Chen, Xi
Han, Yanjun
Wang, Yining
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[43] Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback
Jourdan, Marc
Mutny, Mojmir
Kirschner, Johannes
Krause, Andreas
[J]. ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
[44] Efficient Learning in Large-Scale Combinatorial Semi-Bandits
Wen, Zheng
Kveton, Branislav
Ashkan, Azin
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1113 - 1122
[45] Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation
Wang, Yingfei
Ouyang, Hua
Wang, Chu
Chen, Jianhui
Asamov, Tsvetan
Chang, Yi
[J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2746 - 2753
[46] Contextual Combinatorial Multi-armed Bandits with Volatile Arms and Submodular Reward
Chen, Lixing
Xu, Jie
Lu, Zhuo
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[47] Combinatorial Multi-armed Bandits for Real-Time Strategy Games
Ontanon, Santiago
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2017, 58 : 665 - 702
[48] Efficient Client Selection Based on Contextual Combinatorial Multi-Arm Bandits
Shi, Fang
Lin, Weiwei
Fan, Lisheng
Lai, Xiazhi
Wang, Xiumin
[J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (08) : 5265 - 5277
[49] Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits
Cuvelier, Thibaut
Combes, Richard
Gourdin, Eric
[J]. PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2021, 5 (01)
[50] An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits
Takemura, Kei
Ito, Shinji
[J]. 2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 1318 - 1323

← 1 2 3 4 5 →