Online Second Price Auction with Semi-Bandit Feedback under the Non-Stationary Setting

被引：0

作者：

Zhao, Haoyu ^{[1
]}

Chen, Wei ^{[2
]}

机构：

[1] Tsinghua Univ, IIIS, Beijing, Peoples R China

[2] Microsoft Res, Beijing, Peoples R China

来源：

THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2020年 / 34卷

关键词：

REGRET;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we study the non-stationary online second price auction problem. We assume that the seller is selling the same type of items in T rounds by the second price auction, and she can set the reserve price in each round. In each round, the bidders draw their private values from a joint distribution unknown to the seller. Then, the seller announced the reserve price in this round. Next, bidders with private values higher than the announced reserve price in that round will report their values to the seller as their bids. The bidder with the highest bid larger than the reserved price would win the item and she will pay to the seller the price equal to the second-highest bid or the reserve price, whichever is larger. The seller wants to maximize her total revenue during the time horizon T while learning the distribution of private values over time. The problem is more challenging than the standard online learning scenario since the private value distribution is non-stationary, meaning that the distribution of bidders' private values may change over time, and we need to use the non-stationary regret to measure the performance of our algorithm. To our knowledge, this paper is the first to study the repeated auction in the non-stationary setting theoretically. Our algorithm achieves the non-stationary regret upper bound (O) over tilde (min{root ST, V-1/3 T-2/3}), where S is the number of switches in the distribution, and <overline>V is the sum of total variation, and S and (V) over bar are not needed to be known by the algorithm. We also prove regret lower bounds Omega(root ST) in the switching case and Omega((V) over bar (1/3) T-2/3) in the dynamic case, showing that our algorithm has nearly optimal non-stationary regret.

引用

页码：6893 / 6900

页数：8

共 14 条

[1] Non-Stationary Delayed Combinatorial Semi-Bandit With Causally Related Rewards
Ghoorchian, Saeed
Bilaj, Steven
Maghsudi, Setareh
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2025, 6 : 369 - 384
[2] Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback
Wen, Zheng
Kveton, Branislav
Valko, Michal
Vaswani, Sharan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[3] ONLINE LEARNING FOR COMPUTATION PEER OFFLOADING WITH SEMI-BANDIT FEEDBACK
Zhu, Hongbin
Kang, Kai
Luo, Xiliang
Qian, Hua
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 4524 - 4528
[4] Learning from Delayed Semi-Bandit Feedback under Strong Fairness Guarantees
Steiger, Juaren
Li, Bin
Lu, Ning
IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 1379 - 1388
[5] Non-Stationary Bandit Strategy for Rate Adaptation With Delayed Feedback
Zhao, Yapeng
Qian, Hua
Kang, Kai
Jin, Yanliang
IEEE ACCESS, 2020, 8 : 75503 - 75511
[6] Non-Stationary Bandit Strategy for Rate Adaptation with Delayed Feedback
Zhao, Yapeng
Qian, Hua
Kang, Kai
Jin, Yanliang
IEEE Access, 2020, 8 : 75503 - 75511
[7] Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget
Brandt, Jasmin
Bengs, Viktor
Haddenhorst, Bjoern
Huellermeier, Eyke
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[8] Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop Networks
Dai, Yan
Huang, Longbo
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2024, 8 (03)
[9] Second-order non-stationary online learning for regression
Moroshko, Edward
Vaits, Nina
Crammer, Koby
Journal of Machine Learning Research, 2015, 16 : 1481 - 1517
[10] Second-Order Non-Stationary Online Learning for Regression
Moroshko, Edward
Vaits, Nina
Crammer, Koby
JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 1481 - 1517

← 1 2 →