Bayesian Incentive-Compatible Bandit Exploration

被引:16
|
作者
Mansour, Yishay [1 ]
Slivkins, Aleksandrs [2 ]
Syrgkanis, Vasilis [3 ]
机构
[1] Tel Aviv Univ, Sch Comp Sci, IL-6997801 Tel Aviv, Israel
[2] Microsoft Res, New York, NY 10011 USA
[3] Microsoft Res, Cambridge, MA 02142 USA
关键词
mechanism design; multiarmed bandits; regret; Bayesian incentive-compatibility; CLINICAL-TRIAL DESIGN; MULTIARMED BANDIT; ALGORITHMS; SIGNATURE; REGRET; BOUNDS; ERA;
D O I
10.1287/opre.2019.1949
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
As self-interested individuals ("agents") make decisions over time, they utilize information revealed by other agents in the past and produce information that may help agents in the future. This phenomenon is common in a wide range of scenarios in the Internet economy, as well as in medical decisions. Each agent would like to exploit: select the best action given the current information, but would prefer the previous agents to explore: try out various alternatives to collect information. A social planner, by means of a carefully designed recommendation policy, can incentivize the agents to balance the exploration and exploitation so as to maximize social welfare. We model the planner's recommendation policy as a multiarm bandit algorithm under incentive-compatibility constraints induced by agents' Bayesian priors. We design a bandit algorithm which is incentive-compatible and has asymptotically optimal performance, as expressed by regret. Further, we provide a black-box reduction from an arbitrary multiarm bandit algorithm to an incentive-compatible one, with only a constant multiplicative increase in regret. This reduction works for very general bandit settings that incorporate contexts and arbitrary partial feedback.
引用
收藏
页码:1132 / 1161
页数:30
相关论文
共 50 条
  • [41] Incentive-Compatible Assortment Optimization for Sponsored Products
    Balseiro, Santiago R.
    Desir, Antoine
    MANAGEMENT SCIENCE, 2023, 69 (08) : 4668 - 4684
  • [42] Incentive-Compatible Selection for One or Two Influentials
    Zhao, Yuxin
    Zhang, Yao
    Zhao, Dengji
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 2931 - 2938
  • [43] No-Regret and Incentive-Compatible Online Learning
    Freeman, Rupert
    Pennock, David M.
    Podimata, Chara
    Vaughan, Jennifer Wortman
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [44] INCENTIVE-COMPATIBLE SURVEYS VIA POSTERIOR PROBABILITIES
    Cvitanic, J.
    Prelec, D.
    Radas, S.
    Sikic, H.
    THEORY OF PROBABILITY AND ITS APPLICATIONS, 2020, 65 (02) : 292 - 321
  • [45] Incentive-compatible pricing strategies in noncooperative networks
    Korilis, YA
    Varvarigou, TA
    Ahuja, SR
    IEEE INFOCOM '98 - THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS. 1-3: GATEWAY TO THE 21ST CENTURY, 1998, : 439 - 446
  • [46] Adaptive Incentive-Compatible Sponsored Search Auction
    Gonen, Rica
    Pavlov, Elan
    SOFSEM 2009-THEORY AND PRACTICE OF COMPUTER SCIENCE, PROCEEDINGS, 2009, 5404 : 303 - +
  • [47] Incentive-Compatible Learning of Reserve Prices for Repeated Auctions
    Kanoria, Yash
    Nazerzadeh, Hamid
    OPERATIONS RESEARCH, 2021, 69 (02) : 509 - 524
  • [48] Reservation price as a range: An incentive-compatible measurement approach
    Wang, Tuo
    Venkatesh, R.
    Chatterjee, Rabikar
    JOURNAL OF MARKETING RESEARCH, 2007, 44 (02) : 200 - 213
  • [49] Incentive-Compatible Learning of Reserve Prices for Repeated Auctions
    Kanoria, Yash
    Nazerzadeh, Hamid
    COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ), 2019, : 932 - 933
  • [50] Brief Announcement: Incentive-Compatible Distributed Greedy Protocols
    Nisan, Noam
    Schapira, Michael
    Valiant, Gregory
    Zohar, Aviv
    PODC 11: PROCEEDINGS OF THE 2011 ACM SYMPOSIUM PRINCIPLES OF DISTRIBUTED COMPUTING, 2011, : 335 - 336