Cost-Effective Incentive Allocation via Structured Counterfactual Inference

被引:0
|
作者
Lopez, Romain [1 ]
Li, Chenchen [2 ,3 ]
Yan, Xiang [2 ,3 ]
Xiong, Junwu [2 ]
Jordan, Michael, I [1 ]
Qi, Yuan [2 ]
Song, Le [2 ,4 ]
机构
[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
[2] Ant Financial Serv Grp, AI Dept, Hangzhou, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Comp Sci, Shanghai, Peoples R China
[4] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address a practical problem ubiquitous in modern marketing campaigns, in which a central agent tries to learn a policy for allocating strategic financial incentives to customers and observes only bandit feedback. In contrast to traditional policy optimization frameworks, we take into account the additional reward structure and budget constraints common in this setting, and develop a new two-step method for solving this constrained counterfactual policy optimization problem. Our method first casts the reward estimation problem as a domain adaptation problem with supplementary structure, and then subsequently uses the estimators for optimizing the policy with constraints. We also establish theoretical error bounds for our estimation procedure and we empirically show that the approach leads to significant improvement on both synthetic and real datasets.
引用
收藏
页码:4997 / 5004
页数:8
相关论文
共 50 条