Online Matching with Bayesian Rewards

被引:0
|
作者
Simchi-Levi, David [1 ,2 ]
Sun, Rui [3 ]
Wang, Xinshang [4 ,5 ]
机构
[1] MIT, Inst Data Syst & Soc, Dept Civil & Environm Engn, Cambridge, MA 02139 USA
[2] MIT, Operat Res Ctr, Cambridge, MA 02139 USA
[3] MIT, Inst Data Syst & Soc, Cambridge, MA 02139 USA
[4] Alibaba Grp US, San Mateo, CA 94402 USA
[5] Shanghai Jiao Tong Univ, Antai Coll Econ & Management, Shanghai 200240, Peoples R China
关键词
online matching; Bayesian learning; Markovian bandits; approximation algorithm; APPROXIMATION ALGORITHMS; STOCHASTIC KNAPSACK; RESTLESS BANDITS; INDEX;
D O I
10.1287/opre.2021.0499
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
We study in this paper an online matching problem where a central platform needs to match a number of limited resources to different groups of users that arrive sequentially over time. The reward of each matching option depends on both the type of resource and the time period the user arrives. The matching rewards are assumed to be unknown but drawn from probability distributions that are known a priori. The platform then needs to learn the true rewards online based on real-time observations of the matching results. The goal of the central platform is to maximize the total reward from all of the matchings without violating the resource capacity constraints. We formulate this matching problem with Bayesian rewards as a Markovian multiarmed bandit problem with budget constraints, where each arm corresponds to a pair of a resources and a time period. We devise our algorithm by first finding policies for each single arm separately via a relaxed linear program and then "assembling" these policies together through judicious selection criteria and well-designed pulling orders. We prove that the expected reward of our algorithm is at least 1 2 ( 2 � 1) of the expected reward of an optimal algorithm.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Online Matching with Stochastic Rewards
    Mehta, Aranyak
    Panigrahi, Debmalya
    2012 IEEE 53RD ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2012, : 728 - 737
  • [2] When Stochastic Rewards Reduce to Deterministic Rewards in Online Bipartite Matching
    Udwani, Rajan
    2024 SYMPOSIUM ON SIMPLICITY IN ALGORITHMS, SOSA, 2024, : 321 - 330
  • [3] Online Primal Dual Meets Online Matching with Stochastic Rewards: Configuration LP to the Rescue
    Huang, Zhiyi
    Zhang, Qiankun
    PROCEEDINGS OF THE 52ND ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '20), 2020, : 1153 - 1164
  • [4] ONLINE PRIMAL DUAL MEETS ONLINE MATCHING WITH STOCHASTIC REWARDS: CONFIGURATION LP TO THE RESCUE
    Huang, Zhiyi
    Zhang, Qiankun
    SIAM Journal on Computing, 2024, 53 (05) : 1217 - 1256
  • [5] Online Matching Frameworks Under Stochastic Rewards, Product Ranking, and Unknown Patience
    Brubach, Brian
    Grammel, Nathaniel
    Ma, Will
    Srinivasan, Aravind
    OPERATIONS RESEARCH, 2023,
  • [6] Matching Behaviours and Rewards
    Houston, Alasdair, I
    Trimmer, Pete C.
    McNamara, John M.
    TRENDS IN COGNITIVE SCIENCES, 2021, 25 (05) : 403 - 415
  • [7] Online Bayesian Moment Matching based SAT Solver Heuristics
    Duan, Haonan
    Nejati, Saeed
    Trimponias, George
    Poupart, Pascal
    Ganesh, Vijay
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [8] Online Bayesian Moment Matching based SAT Solver Heuristics
    Duan, Haonan
    Nejati, Saeed
    Trimponias, George
    Poupart, Pascal
    Ganesh, Vijay
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [9] Online Matching with Stochastic Rewards: Optimal Competitive Ratio via Path-Based Formulation
    Goyal, Vineet
    Udwani, Rajan
    OPERATIONS RESEARCH, 2023, 71 (02) : 563 - 580
  • [10] Online Bayesian Moment Matching for Topic Modeling with Unknown Number of Topics
    Hsu, Wei-Shou
    Poupart, Pascal
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29