SOAC: Supervised Off-Policy Actor -Critic for Recommender Systems

被引：0

作者：

Wu, Shiqing ^{[1
]}

Xu, Guandong ^{[1
]}

Wang, Xianzhi ^{[1
]}

机构：

[1] Univ Technol Sydney, Sch Comp Sci, Sydney, NSW, Australia

来源：

23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023 | 2023年

基金：

美国国家科学基金会; 澳大利亚研究理事会;

关键词：

Recommender Systems; Sequential Recommendation; Reinforcement Learning; Off-Policy Actor-Critic;

D O I：

10.1109/ICDM58522.2023.00185

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Improving users' long-term experience in recommender systems (RS) has become a growing concern for recommendation platforms. Reinforcement learning (RL) is an attractive approach because it can plan and optimize long-term returns sequentially. However, directly applying RL as an online learning method in the RS setting can significantly compromise users' satisfaction and experience. As a result, learning the recommendation policy from logged feedback collected under different policies has emerged as a promising direction. Offline learning enables the agent to utilize off-policy learning techniques. Nevertheless, several challenges need to be addressed, such as distribution shift. In this paper, we propose a novel RL method, called Supervised Off-Policy Actor-Critic (SOAC), for learning the recommendation policy from the logged feedback without exploration. The proposed SOAC addresses challenges, including distribution shift and extrapolation errors, and focuses on improving the ranking of items in a recommendation list. The experimental results demonstrate that SOAC can achieve better recommendation performance than existing supervised RL methods.

引用

页码：14121 / 14626

页数：506

共 50 条

[1] Off-Policy Actor-critic for Recommender Systems
Chen, Minmin
Xu, Can
Gatto, Vince
Jain, Devanshu
Kumar, Aviral
Chi, Ed
[J]. PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 338 - 349
[2] Generalized Off-Policy Actor-Critic
Zhang, Shangtong
Boehmer, Wendelin
Whiteson, Shimon
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Meta attention for Off-Policy Actor-Critic
Huang, Jiateng
Huang, Wanrong
Lan, Long
Wu, Dan
[J]. NEURAL NETWORKS, 2023, 163 : 86 - 96
[4] Off-Policy Actor-Critic with Emphatic Weightings
Graves, Eric
Imani, Ehsan
Kumaraswamy, Raksha
White, Martha
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[5] Supervised Advantage Actor-Critic for Recommender Systems
Xin, Xin
Karatzoglou, Alexandros
Arapakis, Ioannis
Jose, Joemon M.
[J]. WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 1186 - 1196
[6] Variance Penalized On-Policy and Off-Policy Actor-Critic
Jain, Arushi
Patil, Gandharv
Jain, Ayush
Khetarpa, Khimya
Precup, Doina
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7899 - 7907
[7] Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances
Song, Ruizhuo
Lewis, Frank L.
Wei, Qinglai
Zhang, Huaguang
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (05) : 1041 - 1050
[8] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
Tasfi, Norman
Capretz, Miriam
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[9] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
Zhang, Yan
Zavlanos, Michael M.
[J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
[10] Online Meta-Critic Learning for Off-Policy Actor-Critic Methods
Zhou, Wei
Li, Yiying
Yang, Yongxin
Wang, Huaimin
Hospedales, Timothy M.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33

← 1 2 3 4 5 →