Combinatorial bandits

被引:208
|
作者
Cesa-Bianchi, Nicolo [1 ]
Lugosi, Gabor [2 ,3 ]
机构
[1] Univ Milan, I-20122 Milan, Italy
[2] ICREA, Barcelona, Spain
[3] Pompeu Fabra Univ, Barcelona, Spain
关键词
Online prediction; Adversarial bandit problems; Online linear optimization; ALGORITHMS;
D O I
10.1016/j.jcss.2012.01.001
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We study sequential prediction problems in which, at each time instance, the forecaster chooses a vector from a given finite set S subset of R-d. At the same time, the opponent chooses a "loss" vector in R-d and the forecaster suffers a loss that is the inner product of the two vectors. The goal of the forecaster is to achieve that, in the long run, the accumulated loss is not much larger than that of the best possible element in S. We consider the "bandit" setting in which the forecaster only has access to the losses of the chosen vectors (i.e., the entire loss vectors are not observed). We introduce a variant of a strategy by Dani, Hayes and Kakade achieving a regret bound that, for a variety of concrete choices of S, is of order root nd ln vertical bar S vertical bar where n is the time horizon. This is not improvable in general and is better than previously known bounds. The examples we consider are all such that S subset of {0. 1}(d), and we show how the combinatorial structure of these classes can be exploited to improve the regret bounds. We also point out computationally efficient implementations for various interesting choices of S. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:1404 / 1422
页数:19
相关论文
共 50 条
  • [41] Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints
    Xu, Huanle
    Liu, Yang
    Lau, Wing Cheong
    Li, Rui
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2554 - 2560
  • [42] Adversarial Combinatorial Bandits with General Non-linear Reward Functions
    Chen, Xi
    Han, Yanjun
    Wang, Yining
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [43] Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback
    Jourdan, Marc
    Mutny, Mojmir
    Kirschner, Johannes
    Krause, Andreas
    [J]. ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
  • [44] Efficient Learning in Large-Scale Combinatorial Semi-Bandits
    Wen, Zheng
    Kveton, Branislav
    Ashkan, Azin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1113 - 1122
  • [45] Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation
    Wang, Yingfei
    Ouyang, Hua
    Wang, Chu
    Chen, Jianhui
    Asamov, Tsvetan
    Chang, Yi
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2746 - 2753
  • [46] Contextual Combinatorial Multi-armed Bandits with Volatile Arms and Submodular Reward
    Chen, Lixing
    Xu, Jie
    Lu, Zhuo
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [47] Combinatorial Multi-armed Bandits for Real-Time Strategy Games
    Ontanon, Santiago
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2017, 58 : 665 - 702
  • [48] Efficient Client Selection Based on Contextual Combinatorial Multi-Arm Bandits
    Shi, Fang
    Lin, Weiwei
    Fan, Lisheng
    Lai, Xiazhi
    Wang, Xiumin
    [J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (08) : 5265 - 5277
  • [49] Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits
    Cuvelier, Thibaut
    Combes, Richard
    Gourdin, Eric
    [J]. PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2021, 5 (01)
  • [50] An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits
    Takemura, Kei
    Ito, Shinji
    [J]. 2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 1318 - 1323