Reducing Dueling Bandits to Cardinal Bandits

被引:0
|
作者
Ailon, Nir [1 ]
Karnin, Zohar [2 ]
Joachims, Thorsten [3 ]
机构
[1] Technion, Dept Comp Sci, IL-32000 Haifa, Israel
[2] Yahoo Labs, IL-31905 Haifa, Israel
[3] Cornell Univ, Dept Comp Sci, Ithaca, NY 14850 USA
基金
以色列科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions - named Doubler, MultiSBM and Sparring - provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For Doubler and MultiSBM we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of Sparring which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms.
引用
收藏
页码:856 / 864
页数:9
相关论文
共 50 条
  • [1] Advancements in Dueling Bandits
    Sui, Yanan
    Zoghi, Masrour
    Hofmann, Katja
    Yue, Yisong
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 5502 - 5510
  • [2] Green Dueling Bandits
    Wang, Shangshang
    Shao, Ziyu
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023,
  • [3] Adversarial Dueling Bandits
    Saha, Aadirupa
    Koren, Tomer
    Mansour, Yishay
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] Copeland Dueling Bandits
    Zoghi, Masrour
    Karnin, Zohar
    Whiteson, Shimon
    de Rijke, Maarten
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [5] Batched Dueling Bandits
    Argarwal, Arpit
    Ghuge, Rohan
    Nagarajan, Viswanath
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 89 - +
  • [6] Sparse Dueling Bandits
    Jamieson, Kevin
    Katariya, Sumeet
    Deshpande, Atul
    Nowak, Robert
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 416 - 424
  • [7] Dueling Bandits with Qualitative Feedback
    Xu, Liyuan
    Honda, Junya
    Sugiyama, Masashi
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5549 - 5556
  • [8] Dueling Bandits with Team Comparisons
    Cohen, Lee
    Schmidt-Kraepelin, Ulrike
    Mansour, Yishay
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [9] Dueling Bandits with Adversarial Sleeping
    Saha, Aadirupa
    Gaillard, Pierre
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [10] Human Preferences as Dueling Bandits
    Yan, Xinyi
    Luo, Chengxi
    Clarke, Charles L. A.
    Craswell, Nick
    Voorhees, Ellen M.
    Castells, Pablo
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 567 - 577