MDDL: A Framework for Reinforcement Learning-based Position Allocation in Multi-Channel Feed

被引:0
|
作者
Shi, Xiaowen [1 ]
Wang, Ze [1 ]
Cai, Yuanying [1 ,2 ]
Wu, Xiaoxu [1 ]
Yang, Fan [1 ]
Liao, Guogang [1 ]
Wang, Yongkang [1 ]
Wang, Xingxing [1 ]
Wang, Dong [1 ]
机构
[1] Meituan, Beijing, Peoples R China
[2] Tsinghua Univ, IIIS, Beijing, Peoples R China
关键词
Reinforcement Learning; Multi-Distribution Data Learning; Position Allocation;
D O I
10.1145/3539618.3592018
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, the mainstream approach in position allocation system is to utilize a reinforcement learning model to allocate appropriate locations for items in various channels and then mix them into the feed. There are two types of data employed to train reinforcement learning (RL) model for position allocation, named strategy data and random data. Strategy data is collected from the current online model, it suffers from an imbalanced distribution of state-action pairs, resulting in severe overestimation problems during training. On the other hand, random data offers a more uniform distribution of state-action pairs, but is challenging to obtain in industrial scenarios as it could negatively impact platform revenue and user experience due to random exploration. As the two types of data have different distributions, designing an effective strategy to leverage both types of data to enhance the efficacy of the RL model training has become a highly challenging problem. In this study, we propose a framework named Multi-Distribution Data Learning (MDDL) to address the challenge of effectively utilizing both strategy and random data for training RL models on mixed multi-distribution data. Specifically, MDDL incorporates a novel imitation learning signal to mitigate overestimation problems in strategy data and maximizes the RL signal for random data to facilitate effective learning. In our experiments, we evaluated the proposed MDDL framework in a real-world position allocation system and demonstrated its superior performance compared to the previous baseline. MDDL has been fully deployed on the Meituan food delivery platform and currently serves over 300 million users.
引用
收藏
页码:2159 / 2163
页数:5
相关论文
共 50 条
  • [1] A Multi-Channel Reinforcement Learning Framework for Robotic Mirror Therapy
    Xu, Jiajun
    Xu, Linsen
    Li, Youfu
    Cheng, Gaoxin
    Shi, Jia
    Liu, Jinfu
    Chen, Shouqi
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04): : 5385 - 5392
  • [2] Deep reinforcement learning-based multi-channel spectrum sharing technology for next generation multi-operator cellular networks
    Shin, Minsu
    Mahboob, Tahira
    Mughal, Danish Mehmood
    Chung, Min Young
    WIRELESS NETWORKS, 2023, 29 (02) : 809 - 820
  • [3] A Multi-Channel Advertising Budget Allocation Using Reinforcement Learning and an Improved Differential Evolution Algorithm
    Li, Mengfan
    Zhang, Jian
    Alizadehsani, Roohallah
    Plawiak, Pawel
    IEEE ACCESS, 2024, 12 : 100559 - 100580
  • [4] Multi-Channel Interactive Reinforcement Learning for Sequential Tasks
    Koert, Dorothea
    Kircher, Maximilian
    Salikutluk, Vildan
    D'Eramo, Carlo
    Peters, Jan
    FRONTIERS IN ROBOTICS AND AI, 2020, 7
  • [5] Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks
    Cui, Jingjing
    Liu, Yuanwei
    Nallanathan, Arumugam
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2020, 19 (02) : 729 - 743
  • [6] Multi-Channel Opportunistic Access for Heterogeneous Networks Based on Deep Reinforcement Learning
    Ye, Xiaowen
    Yu, Yiding
    Fu, Liqun
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2022, 21 (02) : 794 - 807
  • [7] MAC Protocol for Multi-channel Heterogeneous Networks Based on Deep Reinforcement Learning
    Ye, Xiaowen
    Yu, Yiding
    Fu, Liqun
    2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [8] Deep Reinforcement Learning-Based Channel Allocation for Wireless LANs with Graph Convolutional Networks
    Nakashima, Kota
    Kamiya, Shotaro
    Ohtsu, Kazuki
    Yamamoto, Koji
    Nishio, Takayuki
    Morikura, Masahiro
    2019 IEEE 90TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2019-FALL), 2019,
  • [9] Deep Reinforcement Learning-Based Channel Allocation for Wireless LANs With Graph Convolutional Networks
    Nakashima, Kota
    Kamiya, Shotaro
    Ohtsu, Kazuki
    Yamamoto, Koji
    Nishio, Takayuki
    Morikura, Masahiro
    IEEE ACCESS, 2020, 8 : 31823 - 31834
  • [10] A Deep Reinforcement Learning-Based Framework for Dynamic Resource Allocation in Multibeam Satellite Systems
    Hu, Xin
    Liu, Shuaijun
    Chen, Rong
    Wang, Weidong
    Wang, Chunting
    IEEE COMMUNICATIONS LETTERS, 2018, 22 (08) : 1612 - 1615