InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models

被引:1
|
作者
Nagrecha, Kabir [1 ]
Liu, Lingyi [1 ]
Delgado, Pablo [1 ]
Padmanabhan, Prasanna [1 ]
机构
[1] Netflix Inc, Los Gatos, CA 95032 USA
关键词
data processing; recommendation systems; deep learning; parallel computing; resource allocation;
D O I
10.1145/3604915.3608778
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning-based recommender models (DLRMs) have become an essential component of many modern recommender systems. Several companies are nowbuilding large compute clusters reserved only for DLRM training, driving new interest in cost- & time- saving optimizations. The systems challenges faced in this setting are unique; while typical deep learning (DL) training jobs are dominated by model execution times, the most important factor in DLRM training performance is often online data ingestion. In this paper, we explore the unique characteristics of this data ingestion problem and provide insights into the specific bottlenecks and challenges of the DLRM training pipeline at scale. We study real-world DLRM data processing pipelines taken from our compute cluster at Netflix to both observe the performance impacts of online ingestion and to identify shortfalls in existing data pipeline optimizers. We find that current tooling either yields sub-optimal performance, frequent crashes, or else requires impractical cluster re-organization to adopt. Our studies lead us to design and build a new solution for data pipeline optimization, InTune. InTune employs a reinforcement learning (RL) agent to learn how to distribute the CPU resources of a trainer machine across a DLRM data pipeline to more effectively parallelize data-loading and improve throughput. Our experiments show that InTune can build an optimized data pipeline configuration within only a few minutes, and can easily be integrated into existing training workflows. By exploiting the responsiveness and adaptability of RL, InTune achieves significantly higher online data ingestion rates than existing optimizers, thus reducing idle times in model execution and increasing efficiency. We apply InTune to our real-world cluster, and find that it increases data ingestion throughput by as much as 2.29X versus current state-of-the-art data pipeline optimizers while also improving both CPU & GPU utilization.
引用
下载
收藏
页码:430 / 442
页数:13
相关论文
共 50 条
  • [21] Deep Reinforcement Learning-Based Optimization for Crew Allocation in Modular Building Prefabrication
    Deria, Anisha
    Lee, Yong-Cheol
    Ghannad, Pedram
    CONSTRUCTION RESEARCH CONGRESS 2024: ADVANCED TECHNOLOGIES, AUTOMATION, AND COMPUTER APPLICATIONS IN CONSTRUCTION, 2024, : 1317 - 1326
  • [22] Enhancing overall performance of thermophotovoltaics via deep reinforcement learning-based optimization
    Yu, Shilv
    Chen, Zihe
    Liao, Wentao
    Yuan, Cheng
    Shang, Bofeng
    Hu, Run
    JOURNAL OF APPLIED PHYSICS, 2024, 136 (02)
  • [23] Deep Reinforcement Learning-Based Offloading Decision Optimization in Mobile Edge Computing
    Zhang, Hao
    Wu, Wenjun
    Wang, Chaoyi
    Li, Meng
    Yang, Ruizhe
    2019 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2019,
  • [24] Deep Reinforcement Learning-based Energy Efficiency Optimization For Flying LoRa Gateways
    Jouhari, Mohammed
    Ibrahimi, Khalil
    Ben Othman, Jalel
    Amhoud, El Mehdi
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 6157 - 6162
  • [25] A data interpretation approach for deep learning-based prediction models
    Dadsetan, Saba
    Wu, Shandong
    MEDICAL IMAGING 2019: IMAGING INFORMATICS FOR HEALTHCARE, RESEARCH, AND APPLICATIONS, 2019, 10954
  • [26] DEEP REINFORCEMENT LEARNING-BASED IRRIGATION SCHEDULING
    Yang, Y.
    Hu, J.
    Porter, D.
    Marek, T.
    Heflin, K.
    Kong, H.
    Sun, L.
    TRANSACTIONS OF THE ASABE, 2020, 63 (03) : 549 - 556
  • [27] Reinforcement Learning-Based Recommendation with User Reviews on Knowledge Graphs
    Zhang, Siyuan
    Ouyang, Yuanxin
    Liu, Zhuang
    He, Weijie
    Rong, Wenge
    Xiong, Zhang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, KSEM 2023, 2023, 14119 : 148 - 159
  • [28] Efficient and Accurate Leakage Points Detection in Gas Pipeline Using Reinforcement Learning-Based Optimization
    He, Qinglin
    Zhou, Lianjie
    Zhang, Feng
    Guan, Dongjie
    Zhang, Xiang
    IEEE SENSORS JOURNAL, 2024, 24 (17) : 27640 - 27652
  • [29] Improving Deep Learning-Based Recommendation Attack Detection Using Harris Hawks Optimization
    Zhou, Quanqiang
    Huang, Cheng
    Duan, Liangliang
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [30] DeepSet: Deep Learning-based Recommendation with Setwise Preference
    Li, Lin
    Pan, Weike
    Chen, Guanliang
    Ming, Zhong
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,