Reducing Dueling Bandits to Cardinal Bandits

被引：0

作者：

Ailon, Nir ^{[1
]}

Karnin, Zohar ^{[2
]}

Joachims, Thorsten ^{[3
]}

机构：

[1] Technion, Dept Comp Sci, IL-32000 Haifa, Israel

[2] Yahoo Labs, IL-31905 Haifa, Israel

[3] Cornell Univ, Dept Comp Sci, Ithaca, NY 14850 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2) | 2014年 / 32卷

基金：

以色列科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions - named Doubler, MultiSBM and Sparring - provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For Doubler and MultiSBM we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of Sparring which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms.

引用

页码：856 / 864

页数：9

共 50 条

[21] Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits
Maiti, Arnab
Boczar, Ross
Jamieson, Kevin
Ratliff, Lillian J.
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[22] Preference-based Online Learning with Dueling Bandits: A Survey
Bengs, Viktor
Busa-Fekete, Robert
El Mesaoudi-Paul, Adil
Huellermeier, Eyke
JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
[23] Non-stationary Dueling Bandits for Online Learning to Rank
Lu, Shiyin
Miao, Yuan
Yang, Ping
Hu, Yao
Zhang, Lijun
WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 166 - 174
[24] Multi-Dueling Bandits and Their Application to Online Ranker Evaluation
Brost, Brian
Seldin, Yevgeny
Cox, Ingemar J.
Lioma, Christina
CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 2161 - 2166
[25] Preference-based online learning with dueling bandits: A survey
Bengs, Viktor
Busa-Fekete, Robert
Mesaoudi-Paul, Adil El
Hullermeier, Eyke
Journal of Machine Learning Research, 2021, 22
[26] Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
Saha, Aadirupa
Krishnamurthy, Akshay
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167
[27] Bias-Robust Bayesian Optimization via Dueling Bandits
Kirschner, Johannes
Krause, Andreas
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[28] Identification of the Generalized Condorcet Winner in Multi-dueling Bandits
Haddenhorst, Bjorn
Bengs, Viktor
Huellermeier, Eyke
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[29] Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions
Ramamohan, Siddartha
Rajkumar, Arun
Agarwal, Shivani
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[30] Learning to Identify Top Elo Ratings: A Dueling Bandits Approach
Yan, Xue
Du, Yali
Ru, Binxin
Wang, Jun
Zhang, Haifeng
Chen, Xu
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8797 - 8805

← 1 2 3 4 5 →