Reducing Dueling Bandits to Cardinal Bandits

被引：0

作者：

Ailon, Nir ^{[1
]}

Karnin, Zohar ^{[2
]}

Joachims, Thorsten ^{[3
]}

机构：

[1] Technion, Dept Comp Sci, IL-32000 Haifa, Israel

[2] Yahoo Labs, IL-31905 Haifa, Israel

[3] Cornell Univ, Dept Comp Sci, Ithaca, NY 14850 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2) | 2014年 / 32卷

基金：

以色列科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions - named Doubler, MultiSBM and Sparring - provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For Doubler and MultiSBM we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of Sparring which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms.

引用

页码：856 / 864

页数：9

共 50 条

[1] Advancements in Dueling Bandits
Sui, Yanan
Zoghi, Masrour
Hofmann, Katja
Yue, Yisong
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 5502 - 5510
[2] Green Dueling Bandits
Wang, Shangshang
Shao, Ziyu
ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023,
[3] Adversarial Dueling Bandits
Saha, Aadirupa
Koren, Tomer
Mansour, Yishay
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[4] Copeland Dueling Bandits
Zoghi, Masrour
Karnin, Zohar
Whiteson, Shimon
de Rijke, Maarten
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[5] Batched Dueling Bandits
Argarwal, Arpit
Ghuge, Rohan
Nagarajan, Viswanath
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 89 - +
[6] Sparse Dueling Bandits
Jamieson, Kevin
Katariya, Sumeet
Deshpande, Atul
Nowak, Robert
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 416 - 424
[7] Dueling Bandits with Qualitative Feedback
Xu, Liyuan
Honda, Junya
Sugiyama, Masashi
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5549 - 5556
[8] Dueling Bandits with Team Comparisons
Cohen, Lee
Schmidt-Kraepelin, Ulrike
Mansour, Yishay
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[9] Dueling Bandits with Adversarial Sleeping
Saha, Aadirupa
Gaillard, Pierre
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[10] Human Preferences as Dueling Bandits
Yan, Xinyi
Luo, Chengxi
Clarke, Charles L. A.
Craswell, Nick
Voorhees, Ellen M.
Castells, Pablo
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 567 - 577

← 1 2 3 4 5 →