InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models

被引:1
|
作者
Nagrecha, Kabir [1 ]
Liu, Lingyi [1 ]
Delgado, Pablo [1 ]
Padmanabhan, Prasanna [1 ]
机构
[1] Netflix Inc, Los Gatos, CA 95032 USA
关键词
data processing; recommendation systems; deep learning; parallel computing; resource allocation;
D O I
10.1145/3604915.3608778
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning-based recommender models (DLRMs) have become an essential component of many modern recommender systems. Several companies are nowbuilding large compute clusters reserved only for DLRM training, driving new interest in cost- & time- saving optimizations. The systems challenges faced in this setting are unique; while typical deep learning (DL) training jobs are dominated by model execution times, the most important factor in DLRM training performance is often online data ingestion. In this paper, we explore the unique characteristics of this data ingestion problem and provide insights into the specific bottlenecks and challenges of the DLRM training pipeline at scale. We study real-world DLRM data processing pipelines taken from our compute cluster at Netflix to both observe the performance impacts of online ingestion and to identify shortfalls in existing data pipeline optimizers. We find that current tooling either yields sub-optimal performance, frequent crashes, or else requires impractical cluster re-organization to adopt. Our studies lead us to design and build a new solution for data pipeline optimization, InTune. InTune employs a reinforcement learning (RL) agent to learn how to distribute the CPU resources of a trainer machine across a DLRM data pipeline to more effectively parallelize data-loading and improve throughput. Our experiments show that InTune can build an optimized data pipeline configuration within only a few minutes, and can easily be integrated into existing training workflows. By exploiting the responsiveness and adaptability of RL, InTune achieves significantly higher online data ingestion rates than existing optimizers, thus reducing idle times in model execution and increasing efficiency. We apply InTune to our real-world cluster, and find that it increases data ingestion throughput by as much as 2.29X versus current state-of-the-art data pipeline optimizers while also improving both CPU & GPU utilization.
引用
收藏
页码:430 / 442
页数:13
相关论文
共 50 条
  • [41] Study on recommendation of personalised learning resources based on deep reinforcement learning
    Li Z.
    Wang H.
    International Journal of Information and Communication Technology, 2023, 23 (04) : 299 - 313
  • [42] Drug-based recommendation system based on deep learning approach for data optimization
    Vianny, D. Maria Manuel
    Vaddadi, Srinivas Aditya
    Karthikeyan, C.
    Shahid, Mohammad
    Dhanapal, R.
    Ravichand, M.
    SOFT COMPUTING, 2023,
  • [43] Deep Learning-based Pipeline to Recognize Alzheimer's Disease using fMRI Data
    Sarraf, Saman
    Tofighi, Ghassem
    PROCEEDINGS OF 2016 FUTURE TECHNOLOGIES CONFERENCE (FTC), 2016, : 816 - 820
  • [44] Deep Reinforcement Learning for Medicine Recommendation
    Symeonidis, Panagiotis
    Chairistanidis, Stergios
    Zanker, Markus
    2022 IEEE 22ND INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE 2022), 2022, : 85 - 90
  • [45] Deep reinforcement learning-based robust missile guidance
    Ahn, Jeongsu
    Shin, Jongho
    Kim, Hyeong-Geun
    2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 927 - 930
  • [46] A Deep Reinforcement Learning-Based Approach in Porker Game
    Kong, Yan
    Rui, Yefeng
    Hsia, Chih-Hsien
    Journal of Computers (Taiwan), 2023, 34 (02) : 41 - 51
  • [47] A Deep Reinforcement Learning-Based Framework for Content Caching
    Zhong, Chen
    Gursoy, M. Cenk
    Velipasalar, Senem
    2018 52ND ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2018,
  • [48] A novel optimization framework for natural gas transportation pipeline networks based on deep reinforcement learning
    Liu, Zemin Eitan
    Long, Wennan
    Chen, Zhenlin
    Littlefield, James
    Jing, Liang
    Ren, Bo
    El-Houjeiri, Hassan M.
    Qahtani, Amjaad S.
    Jabbar, Muhammad Y.
    Masnadi, Mohammad S.
    Energy and AI, 18
  • [49] Deep Reinforcement Learning-based Traffic Signal Control
    Ruan, Junyun
    Tang, Jinzhuo
    Gao, Ge
    Shi, Tianyu
    Khamis, Alaa
    2023 IEEE INTERNATIONAL CONFERENCE ON SMART MOBILITY, SM, 2023, : 21 - 26
  • [50] Deep reinforcement learning-based antilock braking algorithm
    Mantripragada, V. Krishna Teja
    Kumar, R. Krishna
    VEHICLE SYSTEM DYNAMICS, 2023, 61 (05) : 1410 - 1431