MDDL: A Framework for Reinforcement Learning-based Position Allocation in Multi-Channel Feed

被引:0
|
作者
Shi, Xiaowen [1 ]
Wang, Ze [1 ]
Cai, Yuanying [1 ,2 ]
Wu, Xiaoxu [1 ]
Yang, Fan [1 ]
Liao, Guogang [1 ]
Wang, Yongkang [1 ]
Wang, Xingxing [1 ]
Wang, Dong [1 ]
机构
[1] Meituan, Beijing, Peoples R China
[2] Tsinghua Univ, IIIS, Beijing, Peoples R China
关键词
Reinforcement Learning; Multi-Distribution Data Learning; Position Allocation;
D O I
10.1145/3539618.3592018
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, the mainstream approach in position allocation system is to utilize a reinforcement learning model to allocate appropriate locations for items in various channels and then mix them into the feed. There are two types of data employed to train reinforcement learning (RL) model for position allocation, named strategy data and random data. Strategy data is collected from the current online model, it suffers from an imbalanced distribution of state-action pairs, resulting in severe overestimation problems during training. On the other hand, random data offers a more uniform distribution of state-action pairs, but is challenging to obtain in industrial scenarios as it could negatively impact platform revenue and user experience due to random exploration. As the two types of data have different distributions, designing an effective strategy to leverage both types of data to enhance the efficacy of the RL model training has become a highly challenging problem. In this study, we propose a framework named Multi-Distribution Data Learning (MDDL) to address the challenge of effectively utilizing both strategy and random data for training RL models on mixed multi-distribution data. Specifically, MDDL incorporates a novel imitation learning signal to mitigate overestimation problems in strategy data and maximizes the RL signal for random data to facilitate effective learning. In our experiments, we evaluated the proposed MDDL framework in a real-world position allocation system and demonstrated its superior performance compared to the previous baseline. MDDL has been fully deployed on the Meituan food delivery platform and currently serves over 300 million users.
引用
收藏
页码:2159 / 2163
页数:5
相关论文
共 50 条
  • [31] Dynamic Multi-channel Access in Wireless System with Deep Reinforcement Learning
    Li, Fan
    Zhu, Yun
    Xu, Youyun
    2020 12TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2020, : 283 - 287
  • [32] A Reinforcement Learning-Based Resource Allocation Scheme for Cloud Robotics
    Liu, Hang
    Liu, Shiwen
    Zheng, Kan
    IEEE ACCESS, 2018, 6 : 17215 - 17222
  • [33] A Spatiotemporal Multi-Channel Learning Framework for Automatic Modulation Recognition
    Xu, Jialang
    Luo, Chunbo
    Parr, Gerard
    Luo, Yang
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2020, 9 (10) : 1629 - 1632
  • [34] Deep Reinforcement Learning-Based Long Short-Term Memory for Satellite IoT Channel Allocation
    Durga, S. Lakshmi
    Rajeshwari, Ch
    Allehaibi, Khalid Hamed
    Gupta, Nishu
    Albaqami, Nasser Nammas
    Bharti, Isha
    Basori, Ahmad Hoirul
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 33 (01): : 1 - 19
  • [35] CHANNEL-AWARE DISTRIBUTED DYNAMIC SPECTRUM ACCESS VIA LEARNING-BASED HETEROGENEOUS MULTI-CHANNEL AUCTION
    Zandi, Marjan
    Dong, Min
    Grami, Ali
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [36] A deep learning traffic flow prediction framework based on multi-channel graph convolution
    Zhao, Yuanmeng
    Cao, Jie
    Zhang, Hong
    Liu, Zongli
    TRANSPORTATION PLANNING AND TECHNOLOGY, 2021, 44 (08) : 887 - 900
  • [37] A novel machine learning-based framework for channel bandwidth allocation and optimization in distributed computing environments
    Xu, Miaoxin
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2023, 2023 (01)
  • [38] Ensemble deep learning based resource allocation for multi-channel underlay cognitive radio system
    Lee, Woongsup
    Chung, Byung Chang
    ICT EXPRESS, 2023, 9 (04): : 642 - 647
  • [39] Learning-based Distributed Multi-channel Dynamic Access for Cellular Spectrum Sharing of Multiple Operators
    Shin, Minsu
    Chung, Min Young
    PROCEEDINGS OF 2019 25TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS (APCC), 2019, : 384 - 387
  • [40] Learning Backoff: Deep Reinforcement Learning-Based Wireless Channel Access
    Lee, Taegyeom
    Jo, Ohyun
    IEEE SYSTEMS JOURNAL, 2024, 18 (01): : 351 - 354