Reducing Dueling Bandits to Cardinal Bandits

被引:0
|
作者
Ailon, Nir [1 ]
Karnin, Zohar [2 ]
Joachims, Thorsten [3 ]
机构
[1] Technion, Dept Comp Sci, IL-32000 Haifa, Israel
[2] Yahoo Labs, IL-31905 Haifa, Israel
[3] Cornell Univ, Dept Comp Sci, Ithaca, NY 14850 USA
基金
以色列科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions - named Doubler, MultiSBM and Sparring - provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For Doubler and MultiSBM we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of Sparring which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms.
引用
收藏
页码:856 / 864
页数:9
相关论文
共 50 条
  • [21] Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits
    Maiti, Arnab
    Boczar, Ross
    Jamieson, Kevin
    Ratliff, Lillian J.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [22] Preference-based Online Learning with Dueling Bandits: A Survey
    Bengs, Viktor
    Busa-Fekete, Robert
    El Mesaoudi-Paul, Adil
    Huellermeier, Eyke
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [23] Non-stationary Dueling Bandits for Online Learning to Rank
    Lu, Shiyin
    Miao, Yuan
    Yang, Ping
    Hu, Yao
    Zhang, Lijun
    WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 166 - 174
  • [24] Multi-Dueling Bandits and Their Application to Online Ranker Evaluation
    Brost, Brian
    Seldin, Yevgeny
    Cox, Ingemar J.
    Lioma, Christina
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 2161 - 2166
  • [25] Preference-based online learning with dueling bandits: A survey
    Bengs, Viktor
    Busa-Fekete, Robert
    Mesaoudi-Paul, Adil El
    Hullermeier, Eyke
    Journal of Machine Learning Research, 2021, 22
  • [26] Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
    Saha, Aadirupa
    Krishnamurthy, Akshay
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167
  • [27] Bias-Robust Bayesian Optimization via Dueling Bandits
    Kirschner, Johannes
    Krause, Andreas
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [28] Identification of the Generalized Condorcet Winner in Multi-dueling Bandits
    Haddenhorst, Bjorn
    Bengs, Viktor
    Huellermeier, Eyke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [29] Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions
    Ramamohan, Siddartha
    Rajkumar, Arun
    Agarwal, Shivani
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [30] Learning to Identify Top Elo Ratings: A Dueling Bandits Approach
    Yan, Xue
    Du, Yali
    Ru, Binxin
    Wang, Jun
    Zhang, Haifeng
    Chen, Xu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8797 - 8805