InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models

被引:1
|
作者
Nagrecha, Kabir [1 ]
Liu, Lingyi [1 ]
Delgado, Pablo [1 ]
Padmanabhan, Prasanna [1 ]
机构
[1] Netflix Inc, Los Gatos, CA 95032 USA
关键词
data processing; recommendation systems; deep learning; parallel computing; resource allocation;
D O I
10.1145/3604915.3608778
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning-based recommender models (DLRMs) have become an essential component of many modern recommender systems. Several companies are nowbuilding large compute clusters reserved only for DLRM training, driving new interest in cost- & time- saving optimizations. The systems challenges faced in this setting are unique; while typical deep learning (DL) training jobs are dominated by model execution times, the most important factor in DLRM training performance is often online data ingestion. In this paper, we explore the unique characteristics of this data ingestion problem and provide insights into the specific bottlenecks and challenges of the DLRM training pipeline at scale. We study real-world DLRM data processing pipelines taken from our compute cluster at Netflix to both observe the performance impacts of online ingestion and to identify shortfalls in existing data pipeline optimizers. We find that current tooling either yields sub-optimal performance, frequent crashes, or else requires impractical cluster re-organization to adopt. Our studies lead us to design and build a new solution for data pipeline optimization, InTune. InTune employs a reinforcement learning (RL) agent to learn how to distribute the CPU resources of a trainer machine across a DLRM data pipeline to more effectively parallelize data-loading and improve throughput. Our experiments show that InTune can build an optimized data pipeline configuration within only a few minutes, and can easily be integrated into existing training workflows. By exploiting the responsiveness and adaptability of RL, InTune achieves significantly higher online data ingestion rates than existing optimizers, thus reducing idle times in model execution and increasing efficiency. We apply InTune to our real-world cluster, and find that it increases data ingestion throughput by as much as 2.29X versus current state-of-the-art data pipeline optimizers while also improving both CPU & GPU utilization.
引用
收藏
页码:430 / 442
页数:13
相关论文
共 50 条
  • [1] Deep Reinforcement Learning-Based Routing Optimization Algorithm for Edge Data Center
    Zhao, Jixin
    Zhang, Shukui
    Zhang, Yang
    Zhang, Li
    Long, Hao
    [J]. 26TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2021), 2021,
  • [2] A Deep Reinforcement Learning-Based Geographic Packet Routing Optimization
    Bai, Yijie
    Zhang, Xia
    Yu, Daojie
    Li, Shengxiang
    Wang, Yu
    Lei, Shuntian
    Tian, Zhoutai
    [J]. IEEE ACCESS, 2022, 10 : 108785 - 108796
  • [3] A Deep Learning-Based Pipeline for the Generation of Synthetic Tabular Data
    Panfilo, Daniele
    Boudewijn, Alexander
    Saccani, Sebastiano
    Coser, Andrea
    Svara, Borut
    Chauvenet, Carlo Rossi
    Mami, Ciro Antonio
    Medvet, Eric
    [J]. IEEE ACCESS, 2023, 11 : 63306 - 63323
  • [4] Reinforcement Learning-Based News Recommendation System
    Aboutorab, Hamed
    Hussain, Omar K.
    Saberi, Morteza
    Hussain, Farookh Khadeer
    Prior, Daniel
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (06) : 4493 - 4502
  • [5] Deep Reinforcement Learning-Based Method of Mobile Data Offloading
    Mochizuki, Daisuke
    Abiko, Yu
    Mineno, Hiroshi
    Saito, Takato
    Ikeda, Daizo
    Katagiri, Masaji
    [J]. 2018 ELEVENTH INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND UBIQUITOUS NETWORK (ICMU 2018), 2018,
  • [6] Deep reinforcement learning-based optimization strategy for the cooperative scheduling of harvesters
    Li, Zikang
    Zhang, Fan
    Teng, Guifa
    Li, Zheng
    Wang, Ziyi
    Ma, Shiji
    [J]. Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 40 (14): : 23 - 32
  • [7] Deep Reinforcement Learning-based Music Recommendation with Knowledge Graph Using Acoustic Features
    Sakurai, Keigo
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    [J]. ITE TRANSACTIONS ON MEDIA TECHNOLOGY AND APPLICATIONS, 2022, 10 (01): : 8 - 17
  • [8] Review of Deep Learning-Based Personalized Learning Recommendation
    Zhong, Ling
    Wei, Yantao
    Yao, Huang
    Deng, Wei
    Wang, Zhifeng
    Tong, Mingwen
    [J]. 2020 11TH INTERNATIONAL CONFERENCE ON E-EDUCATION, E-BUSINESS, E-MANAGEMENT, AND E-LEARNING (IC4E 2020), 2020, : 145 - 149
  • [9] Reinforcement learning-based denoising network for sequential recommendation
    Xiaohai Tong
    Pengfei Wang
    Shaozhang Niu
    [J]. Applied Intelligence, 2023, 53 : 1324 - 1335
  • [10] Reinforcement learning-based denoising network for sequential recommendation
    Tong, Xiaohai
    Wang, Pengfei
    Niu, Shaozhang
    [J]. APPLIED INTELLIGENCE, 2023, 53 (02) : 1324 - 1335