Online Second Price Auction with Semi-Bandit Feedback under the Non-Stationary Setting

被引:0
|
作者
Zhao, Haoyu [1 ]
Chen, Wei [2 ]
机构
[1] Tsinghua Univ, IIIS, Beijing, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
来源
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2020年 / 34卷
关键词
REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the non-stationary online second price auction problem. We assume that the seller is selling the same type of items in T rounds by the second price auction, and she can set the reserve price in each round. In each round, the bidders draw their private values from a joint distribution unknown to the seller. Then, the seller announced the reserve price in this round. Next, bidders with private values higher than the announced reserve price in that round will report their values to the seller as their bids. The bidder with the highest bid larger than the reserved price would win the item and she will pay to the seller the price equal to the second-highest bid or the reserve price, whichever is larger. The seller wants to maximize her total revenue during the time horizon T while learning the distribution of private values over time. The problem is more challenging than the standard online learning scenario since the private value distribution is non-stationary, meaning that the distribution of bidders' private values may change over time, and we need to use the non-stationary regret to measure the performance of our algorithm. To our knowledge, this paper is the first to study the repeated auction in the non-stationary setting theoretically. Our algorithm achieves the non-stationary regret upper bound (O) over tilde (min{root ST, V-1/3 T-2/3}), where S is the number of switches in the distribution, and <overline>V is the sum of total variation, and S and (V) over bar are not needed to be known by the algorithm. We also prove regret lower bounds Omega(root ST) in the switching case and Omega((V) over bar (1/3) T-2/3) in the dynamic case, showing that our algorithm has nearly optimal non-stationary regret.
引用
收藏
页码:6893 / 6900
页数:8
相关论文
共 14 条
  • [11] Combinatorial-restless-bandit-based transmitter-receiver online selection of distributed MIMO radar with non-stationary channels
    Hao, Yuhang
    Wang, Zengfu
    Fu, Jing
    Bai, Xianglong
    Li, Can
    Pan, Quan
    SIGNAL PROCESSING, 2025, 227
  • [12] Analysis of blood pressure-heart rate feedback regulation under non-stationary conditions: beyond baroreflex sensitivity
    Bogachev, Mikhail I.
    Mamontov, Oleg V.
    Konradi, Alexandra O.
    Uljanitski, Yuri D.
    Kantelhardt, Jan W.
    Schlyakhto, Eugene V.
    PHYSIOLOGICAL MEASUREMENT, 2009, 30 (07) : 631 - 645
  • [13] Model Based Online Detection of Inter-Turn Short Circuit Faults in PMSM Drives under Non-Stationary Conditions
    Kiselev, Aleksej
    Kuznietsov, Alexander
    Leidhold, Roberto
    2017 11TH IEEE INTERNATIONAL CONFERENCE ON COMPATIBILITY, POWER ELECTRONICS AND POWER ENGINEERING (CPE-POWERENG), 2017, : 370 - 374
  • [14] A Semi-Analytical numerical non-stationary creep analysis of a rotating disk made of Polyamid66 under mechanical and thermal loads
    Safari, Mehrdad
    Loghman, Abbas
    Azami, Mehrdad
    MECHANICS BASED DESIGN OF STRUCTURES AND MACHINES, 2025, 53 (04) : 2765 - 2788