Online learning for auction mechanism in bandit setting

被引:9
|
作者
He, Di [1 ]
Chen, Wei [2 ]
Wang, Liwei [1 ]
Liu, Tie-Yan [2 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, MOE, Key Lab Machine Percept, Beijing 100871, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
Armed bandit problem; Mechanism design; Online advertising;
D O I
10.1016/j.dss.2013.07.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper is concerned with online learning of the optimal auction mechanism for sponsored search in a bandit setting. Previous works take the click-through rates of ads to be fixed and known to the search engine and use this information to design optimal auction mechanism. However, the assumption is not practical since ads can only receive clicks when they are shown to users. To tackle this problem, we propose to use online learning for auction mechanism design. To be specific, this task corresponds to a new type of bandit problem, which we call the armed bandit problem with shared information (AB-SI). In the AB-SI problem, the arm space (corresponding to the parameter space of the auction mechanism which can be discrete or continuous) is partitioned into a finite number of clusters (corresponding to the finite number of rankings of the ads), and the arms in the same cluster share the explored information (i.e., the click-through rates of the ads in the same ranked list) when any arm from the cluster is pulled. We propose two upper-confidence-bound algorithms called UCB-SI1 and UCB-SI2 to tackle this new problem in discrete-armed bandit and continuum-armed bandit setting respectively. We show that when the total number of arms is finite, the regret bound obtained by UCB-SI1 algorithm is tighter than the classical UCB1 algorithm. In the continuum-armed bandit setting, our proposed UCB-SI2 algorithm can handle a larger classes of reward function and achieve a regret bound of O(T-2/3(dInT)(1/3)), where d is the pseudo dimension for the real-valued reward function class. Experimental results show that the proposed algorithms can significantly outperform several classical online learning methods on synthetic data. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:379 / 386
页数:8
相关论文
共 50 条
  • [1] Online Second Price Auction with Semi-Bandit Feedback under the Non-Stationary Setting
    Zhao, Haoyu
    Chen, Wei
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6893 - 6900
  • [2] A Bandit Learning Algorithm and Applications to Auction Design
    Nguyen Kim Thang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), 2020, 33
  • [3] Investigation of Online Auction Mechanism
    Huang, Zhengwei
    Zheng, Xiazhong
    Lu, Yaobin
    [J]. SEVENTH WUHAN INTERNATIONAL CONFERENCE ON E-BUSINESS, VOLS I-III, 2008, : 286 - 290
  • [4] Bandit Online Learning with Unknown Delays
    Li, Bingcong
    Chen, Tianyi
    Giannakis, Georgios B.
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [5] Sustainable Federated Learning with Long-term Online VCG Auction Mechanism
    Wu, Leijie
    Guo, Song
    Liu, Yi
    Hong, Zicong
    Zhan, Yufeng
    Xu, Wenchao
    [J]. 2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2022), 2022, : 895 - 905
  • [6] Online Spectral Learning on a Graph with Bandit Feedback
    Gu, Quanquan
    Han, Jiawei
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 833 - 838
  • [7] AutoBandit: A Meta Bandit Online Learning System
    Xie, Miao
    Yin, Wotao
    Xu, Huan
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 5028 - 5031
  • [8] Online geometric optimization in the bandit setting against an adaptive adversary
    McMahan, HB
    Blum, A
    [J]. LEARNING THEORY, PROCEEDINGS, 2004, 3120 : 109 - 123
  • [9] Statistical Inference for Online Decision Making: In a Contextual Bandit Setting
    Chen, Haoyu
    Lu, Wenbin
    Song, Rui
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (533) : 240 - 255
  • [10] Online double auction mechanism for perishable goods
    Miyashita, Kazuo
    [J]. ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2014, 13 (05) : 355 - 367