Reducing Dueling Bandits to Cardinal Bandits

被引：0

作者：

Ailon, Nir ^{[1
]}

Karnin, Zohar ^{[2
]}

Joachims, Thorsten ^{[3
]}

机构：

[1] Technion, Dept Comp Sci, IL-32000 Haifa, Israel

[2] Yahoo Labs, IL-31905 Haifa, Israel

[3] Cornell Univ, Dept Comp Sci, Ithaca, NY 14850 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2) | 2014年 / 32卷

基金：

以色列科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions - named Doubler, MultiSBM and Sparring - provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For Doubler and MultiSBM we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of Sparring which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms.

引用

页码：856 / 864

页数：9

共 50 条

[31] 'BANDITS, BANDITS' - GILLIAM,T
ZIMMER, J
REVUE DU CINEMA, 1982, (371): : 54 - 54
[32] 'BANDITS, BANDITS' - GILLIAN,T
CARRERE, E
POSITIF, 1982, (254-): : 165 - 166
[33] Roving bandits and stationary bandits
Lee, S
FORBES, 1998, 161 (09): : 149 - +
[34] When Can We Track Significant Preference Shifts in Dueling Bandits?
Suk, Joe
Agarwal, Arpit
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[35] 'BANDITS, BANDITS' - GILLIAM,T
CHION, M
CAHIERS DU CINEMA, 1982, (336): : 50 - 51
[36] Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces
Sui, Yanan
Burdick, Joel W.
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2793 - 2799
[37] Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach
Szorenyi, Balazs
Busa-Fekete, Robert
Paul, Adil
Huellermeier, Eyke
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[38] Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models
Bengs, Viktor
Saha, Aadirupa
Huellermeier, Eyke
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[39] Reducing Exploration of Dying Arms in Mortal Bandits
Traca, Stefano
Rudin, Cynthia
Yan, Weiyu
35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 156 - 163
[40] A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
Gajane, Pratik
Urvoy, Tanguy
Clerot, Fabrice
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 218 - 227

← 1 2 3 4 5 →