MDDL: A Framework for Reinforcement Learning-based Position Allocation in Multi-Channel Feed

被引：0

作者：

Shi, Xiaowen ^{[1
]}

Wang, Ze ^{[1
]}

Cai, Yuanying ^{[1
,2
]}

Wu, Xiaoxu ^{[1
]}

Yang, Fan ^{[1
]}

Liao, Guogang ^{[1
]}

Wang, Yongkang ^{[1
]}

Wang, Xingxing ^{[1
]}

Wang, Dong ^{[1
]}

机构：

[1] Meituan, Beijing, Peoples R China

[2] Tsinghua Univ, IIIS, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年

关键词：

Reinforcement Learning; Multi-Distribution Data Learning; Position Allocation;

D O I：

10.1145/3539618.3592018

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Nowadays, the mainstream approach in position allocation system is to utilize a reinforcement learning model to allocate appropriate locations for items in various channels and then mix them into the feed. There are two types of data employed to train reinforcement learning (RL) model for position allocation, named strategy data and random data. Strategy data is collected from the current online model, it suffers from an imbalanced distribution of state-action pairs, resulting in severe overestimation problems during training. On the other hand, random data offers a more uniform distribution of state-action pairs, but is challenging to obtain in industrial scenarios as it could negatively impact platform revenue and user experience due to random exploration. As the two types of data have different distributions, designing an effective strategy to leverage both types of data to enhance the efficacy of the RL model training has become a highly challenging problem. In this study, we propose a framework named Multi-Distribution Data Learning (MDDL) to address the challenge of effectively utilizing both strategy and random data for training RL models on mixed multi-distribution data. Specifically, MDDL incorporates a novel imitation learning signal to mitigate overestimation problems in strategy data and maximizes the RL signal for random data to facilitate effective learning. In our experiments, we evaluated the proposed MDDL framework in a real-world position allocation system and demonstrated its superior performance compared to the previous baseline. MDDL has been fully deployed on the Meituan food delivery platform and currently serves over 300 million users.

引用

页码：2159 / 2163

页数：5

共 50 条

[21] A Reinforcement Learning-Based Follow-up Framework
Astudillo, Javiera
Protopapas, Pavlos
Pichara, Karim
Becker, Ignacio
ASTRONOMICAL JOURNAL, 2023, 165 (03):
[22] Gamification Framework for Reinforcement Learning-based Neuropsychology Experiments
Chetitah, Mounsif
Mueller, Julian
Deserno, Lorenz
Waltmann, Maria
von Mammen, Sebastian
PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF DIGITAL GAMES, FDG 2023, 2023,
[23] A Deep Reinforcement Learning-Based Framework for Content Caching
Zhong, Chen
Gursoy, M. Cenk
Velipasalar, Senem
2018 52ND ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2018,
[24] RLPS: A Reinforcement Learning-Based Framework for Personalized Search
Yao, Jing
Dou, Zhicheng
Xu, Jun
Wen, Ji-Rong
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2021, 39 (03)
[25] A Learning-based Channel Allocation Protocol for Multi-Radio Wireless Mesh Networks
Pediaditaki, Sofia
Marina, Mahesh K.
IEEE INFOCOM 2009 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS, 2009, : 302 - 303
[26] Evolutionary Framework With Reinforcement Learning-Based Mutation Adaptation
Sallam, Karam M.
Elsayed, Saber M.
Chakrabortty, Ripon K.
Ryan, Michael J.
IEEE ACCESS, 2020, 8 : 194045 - 194071
[27] Multi-Agent Reinforcement Learning-Based Coordinated Dynamic Task Allocation for Heterogenous UAVs
Liu, Da
Dou, Liqian
Zhang, Ruilong
Zhang, Xiuyun
Zong, Qun
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (04) : 4372 - 4383
[28] Reinforcement Learning-Based Resource Allocation for Streaming in a Multi-Modal Deep Space Network
Ha, Taeyun
Oh, Junsuk
Lee, Donghyun
Lee, Jeonghwa
Jeon, Yongin
Cho, Sungrae
12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 201 - 206
[29] A reinforcement learning-based optimization method for task allocation of agricultural multi-robots clusters
Lu, Zaiwang
Wang, Yancong
Dai, Feng
Ma, Yike
Long, Long
Zhao, Zixu
Zhang, Yucheng
Li, Jintao
Computers and Electrical Engineering, 2024, 120
[30] A two-stage reinforcement learning-based approach for multi-entity task allocation
Gong A.
Yang K.
Lyu J.
Li X.
Engineering Applications of Artificial Intelligence, 2024, 136

← 1 2 3 4 5 →