Uncertainty-Aware Instance Reweighting for Off-Policy Learning

被引:0
|
作者
Zhang, Xiaoying [1 ]
Chen, Junpu [2 ]
Wang, Hongning [3 ]
Xie, Hong [4 ]
Liu, Yang [1 ]
Lui, John C. S. [5 ]
Li, Hang [1 ]
机构
[1] ByteDance Res, Beijing, Peoples R China
[2] Chongqing Univ, Chongqing, Peoples R China
[3] Tsinghua Univ, Beijing, Peoples R China
[4] Chinese Acad Sci, Chongqing Inst Green & Intelligent Technol, Chongqing, Peoples R China
[5] Chinese Univ Hong Kong, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Off-policy learning, referring to the procedure of policy optimization with access only to logged feedback data, has shown importance in various real-world applications, such as search engines and recommender systems. While the ground-truth logging policy is usually unknown, previous work simply employs its estimated value for the off-policy learning, ignoring the negative impact from both high bias and high variance resulted from such an estimator. And such impact is often magnified on samples with small and inaccurately estimated logging probabilities. The contribution of this work is to explicitly model the uncertainty in the estimated logging policy, and propose an Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning, with a theoretical convergence guarantee. Experiment results on the synthetic and real-world recommendation datasets demonstrate that UIPS significantly improves the quality of the discovered policy, when compared against an extensive list of state-of-the-art baselines.
引用
收藏
页数:28
相关论文
共 50 条
  • [41] Reliable Off-Policy Evaluation for Reinforcement Learning
    Wang, Jie
    Gao, Rui
    Zha, Hongyuan
    OPERATIONS RESEARCH, 2024, 72 (02) : 699 - 716
  • [42] Sequential Search with Off-Policy Reinforcement Learning
    Miao, Dadong
    Wang, Yanan
    Tang, Guoyu
    Liu, Lin
    Xu, Sulong
    Long, Bo
    Xiao, Yun
    Wu, Lingfei
    Jiang, Yunjiang
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4006 - 4015
  • [43] Representations for Stable Off-Policy Reinforcement Learning
    Ghosh, Dibya
    Bellemare, Marc G.
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [44] Uncertainty-aware machine learning for high energy physics
    Ghosh, Aishik
    Nachman, Benjamin
    Whiteson, Daniel
    PHYSICAL REVIEW D, 2021, 104 (05)
  • [45] Training Uncertainty-Aware Classifiers with Conformalized Deep Learning
    Einbinder, Bat-Sheva
    Romano, Yaniv
    Sesia, Matteo
    Zhou, Yanfei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [46] Uncertainty-Aware Data Aggregation for Deep Imitation Learning
    Cui, Yuchen
    Isele, David
    Niekum, Scott
    Fujimura, Kikuo
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 761 - 767
  • [47] Deep Off-Policy Iterative Learning Control
    Gurumurthy, Swaminathan
    Kolter, J. Zico
    Manchester, Zachary
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [48] Off-Policy Differentiable Logic Reinforcement Learning
    Zhang, Li
    Li, Xin
    Wang, Mingzhong
    Tian, Andong
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT II, 2021, 12976 : 617 - 632
  • [49] Marginalized Operators for Off-policy Reinforcement Learning
    Tang, Yunhao
    Rowland, Mark
    Munos, Remi
    Valko, Michal
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 655 - 679
  • [50] Off-Policy Shaping Ensembles in Reinforcement Learning
    Harutyunyan, Anna
    Brys, Tim
    Vrancx, Peter
    Nowe, Ann
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 1021 - 1022