Online Matching with Bayesian Rewards

被引:0
|
作者
Simchi-Levi, David [1 ,2 ]
Sun, Rui [3 ]
Wang, Xinshang [4 ,5 ]
机构
[1] MIT, Inst Data Syst & Soc, Dept Civil & Environm Engn, Cambridge, MA 02139 USA
[2] MIT, Operat Res Ctr, Cambridge, MA 02139 USA
[3] MIT, Inst Data Syst & Soc, Cambridge, MA 02139 USA
[4] Alibaba Grp US, San Mateo, CA 94402 USA
[5] Shanghai Jiao Tong Univ, Antai Coll Econ & Management, Shanghai 200240, Peoples R China
关键词
online matching; Bayesian learning; Markovian bandits; approximation algorithm; APPROXIMATION ALGORITHMS; STOCHASTIC KNAPSACK; RESTLESS BANDITS; INDEX;
D O I
10.1287/opre.2021.0499
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
We study in this paper an online matching problem where a central platform needs to match a number of limited resources to different groups of users that arrive sequentially over time. The reward of each matching option depends on both the type of resource and the time period the user arrives. The matching rewards are assumed to be unknown but drawn from probability distributions that are known a priori. The platform then needs to learn the true rewards online based on real-time observations of the matching results. The goal of the central platform is to maximize the total reward from all of the matchings without violating the resource capacity constraints. We formulate this matching problem with Bayesian rewards as a Markovian multiarmed bandit problem with budget constraints, where each arm corresponds to a pair of a resources and a time period. We devise our algorithm by first finding policies for each single arm separately via a relaxed linear program and then "assembling" these policies together through judicious selection criteria and well-designed pulling orders. We prove that the expected reward of our algorithm is at least 1 2 ( 2 � 1) of the expected reward of an optimal algorithm.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Business analytics: online promotion with gift rewards
    Yu, Huan
    Shi, Ye
    Yu, Yugang
    Liu, Jie
    Yang, Feng
    Wu, Jie
    ANNALS OF OPERATIONS RESEARCH, 2020, 291 (1-2) : 1061 - 1076
  • [22] Online learning of shaping rewards in reinforcement learning
    Grzes, Marek
    Kudenko, Daniel
    NEURAL NETWORKS, 2010, 23 (04) : 541 - 550
  • [23] Business analytics: online promotion with gift rewards
    Huan Yu
    Ye Shi
    Yugang Yu
    Jie Liu
    Feng Yang
    Jie Wu
    Annals of Operations Research, 2020, 291 : 1061 - 1076
  • [24] Bayesian subsequence matching and segmentation
    Nagy, G
    Xu, YH
    PATTERN RECOGNITION LETTERS, 1997, 18 (11-13) : 1117 - 1124
  • [25] Online Bayesian Persuasion
    Castiglioni, Matteo
    Celli, Andrea
    Marchesi, Alberto
    Gatti, Nicola
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [26] ONLINE WEIGHTED MATCHING
    KALYANASUNDARAM, B
    PRUHS, K
    JOURNAL OF ALGORITHMS, 1993, 14 (03) : 478 - 488
  • [27] Online bottleneck matching
    Barbara M. Anthony
    Christine Chung
    Journal of Combinatorial Optimization, 2014, 27 : 100 - 114
  • [28] Fully Online Matching
    Huang, Zhiyi
    Kang, Ning
    Tang, Zhihao Gavin
    Wu, Xiaowei
    Zhang, Yuhao
    Zhu, Xue
    JOURNAL OF THE ACM, 2020, 67 (03)
  • [29] Online matching on a line
    Fuchs, B
    Hochstättler, W
    Kern, W
    THEORETICAL COMPUTER SCIENCE, 2005, 332 (1-3) : 251 - 264
  • [30] Online bottleneck matching
    Anthony, Barbara M.
    Chung, Christine
    JOURNAL OF COMBINATORIAL OPTIMIZATION, 2014, 27 (01) : 100 - 114