Reducing Dueling Bandits to Cardinal Bandits

被引:0
|
作者
Ailon, Nir [1 ]
Karnin, Zohar [2 ]
Joachims, Thorsten [3 ]
机构
[1] Technion, Dept Comp Sci, IL-32000 Haifa, Israel
[2] Yahoo Labs, IL-31905 Haifa, Israel
[3] Cornell Univ, Dept Comp Sci, Ithaca, NY 14850 USA
基金
以色列科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions - named Doubler, MultiSBM and Sparring - provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For Doubler and MultiSBM we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of Sparring which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms.
引用
收藏
页码:856 / 864
页数:9
相关论文
共 50 条
  • [31] 'BANDITS, BANDITS' - GILLIAM,T
    ZIMMER, J
    REVUE DU CINEMA, 1982, (371): : 54 - 54
  • [32] 'BANDITS, BANDITS' - GILLIAN,T
    CARRERE, E
    POSITIF, 1982, (254-): : 165 - 166
  • [33] Roving bandits and stationary bandits
    Lee, S
    FORBES, 1998, 161 (09): : 149 - +
  • [34] When Can We Track Significant Preference Shifts in Dueling Bandits?
    Suk, Joe
    Agarwal, Arpit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [35] 'BANDITS, BANDITS' - GILLIAM,T
    CHION, M
    CAHIERS DU CINEMA, 1982, (336): : 50 - 51
  • [36] Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces
    Sui, Yanan
    Burdick, Joel W.
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2793 - 2799
  • [37] Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach
    Szorenyi, Balazs
    Busa-Fekete, Robert
    Paul, Adil
    Huellermeier, Eyke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [38] Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models
    Bengs, Viktor
    Saha, Aadirupa
    Huellermeier, Eyke
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [39] Reducing Exploration of Dying Arms in Mortal Bandits
    Traca, Stefano
    Rudin, Cynthia
    Yan, Weiyu
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 156 - 163
  • [40] A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
    Gajane, Pratik
    Urvoy, Tanguy
    Clerot, Fabrice
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 218 - 227