SOAC: Supervised Off-Policy Actor -Critic for Recommender Systems

被引:0
|
作者
Wu, Shiqing [1 ]
Xu, Guandong [1 ]
Wang, Xianzhi [1 ]
机构
[1] Univ Technol Sydney, Sch Comp Sci, Sydney, NSW, Australia
基金
美国国家科学基金会; 澳大利亚研究理事会;
关键词
Recommender Systems; Sequential Recommendation; Reinforcement Learning; Off-Policy Actor-Critic;
D O I
10.1109/ICDM58522.2023.00185
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Improving users' long-term experience in recommender systems (RS) has become a growing concern for recommendation platforms. Reinforcement learning (RL) is an attractive approach because it can plan and optimize long-term returns sequentially. However, directly applying RL as an online learning method in the RS setting can significantly compromise users' satisfaction and experience. As a result, learning the recommendation policy from logged feedback collected under different policies has emerged as a promising direction. Offline learning enables the agent to utilize off-policy learning techniques. Nevertheless, several challenges need to be addressed, such as distribution shift. In this paper, we propose a novel RL method, called Supervised Off-Policy Actor-Critic (SOAC), for learning the recommendation policy from the logged feedback without exploration. The proposed SOAC addresses challenges, including distribution shift and extrapolation errors, and focuses on improving the ranking of items in a recommendation list. The experimental results demonstrate that SOAC can achieve better recommendation performance than existing supervised RL methods.
引用
收藏
页码:14121 / 14626
页数:506
相关论文
共 50 条
  • [1] Off-Policy Actor-critic for Recommender Systems
    Chen, Minmin
    Xu, Can
    Gatto, Vince
    Jain, Devanshu
    Kumar, Aviral
    Chi, Ed
    [J]. PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 338 - 349
  • [2] Generalized Off-Policy Actor-Critic
    Zhang, Shangtong
    Boehmer, Wendelin
    Whiteson, Shimon
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Meta attention for Off-Policy Actor-Critic
    Huang, Jiateng
    Huang, Wanrong
    Lan, Long
    Wu, Dan
    [J]. NEURAL NETWORKS, 2023, 163 : 86 - 96
  • [4] Off-Policy Actor-Critic with Emphatic Weightings
    Graves, Eric
    Imani, Ehsan
    Kumaraswamy, Raksha
    White, Martha
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [5] Supervised Advantage Actor-Critic for Recommender Systems
    Xin, Xin
    Karatzoglou, Alexandros
    Arapakis, Ioannis
    Jose, Joemon M.
    [J]. WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 1186 - 1196
  • [6] Variance Penalized On-Policy and Off-Policy Actor-Critic
    Jain, Arushi
    Patil, Gandharv
    Jain, Ayush
    Khetarpa, Khimya
    Precup, Doina
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7899 - 7907
  • [7] Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances
    Song, Ruizhuo
    Lewis, Frank L.
    Wei, Qinglai
    Zhang, Huaguang
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (05) : 1041 - 1050
  • [8] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
    Tasfi, Norman
    Capretz, Miriam
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [9] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
    Zhang, Yan
    Zavlanos, Michael M.
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
  • [10] Online Meta-Critic Learning for Off-Policy Actor-Critic Methods
    Zhou, Wei
    Li, Yiying
    Yang, Yongxin
    Wang, Huaimin
    Hospedales, Timothy M.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33