InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models

被引：1

作者：

Nagrecha, Kabir ^{[1
]}

Liu, Lingyi ^{[1
]}

Delgado, Pablo ^{[1
]}

Padmanabhan, Prasanna ^{[1
]}

机构：

[1] Netflix Inc, Los Gatos, CA 95032 USA

来源：

PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023 | 2023年

关键词：

data processing; recommendation systems; deep learning; parallel computing; resource allocation;

D O I：

10.1145/3604915.3608778

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning-based recommender models (DLRMs) have become an essential component of many modern recommender systems. Several companies are nowbuilding large compute clusters reserved only for DLRM training, driving new interest in cost- & time- saving optimizations. The systems challenges faced in this setting are unique; while typical deep learning (DL) training jobs are dominated by model execution times, the most important factor in DLRM training performance is often online data ingestion. In this paper, we explore the unique characteristics of this data ingestion problem and provide insights into the specific bottlenecks and challenges of the DLRM training pipeline at scale. We study real-world DLRM data processing pipelines taken from our compute cluster at Netflix to both observe the performance impacts of online ingestion and to identify shortfalls in existing data pipeline optimizers. We find that current tooling either yields sub-optimal performance, frequent crashes, or else requires impractical cluster re-organization to adopt. Our studies lead us to design and build a new solution for data pipeline optimization, InTune. InTune employs a reinforcement learning (RL) agent to learn how to distribute the CPU resources of a trainer machine across a DLRM data pipeline to more effectively parallelize data-loading and improve throughput. Our experiments show that InTune can build an optimized data pipeline configuration within only a few minutes, and can easily be integrated into existing training workflows. By exploiting the responsiveness and adaptability of RL, InTune achieves significantly higher online data ingestion rates than existing optimizers, thus reducing idle times in model execution and increasing efficiency. We apply InTune to our real-world cluster, and find that it increases data ingestion throughput by as much as 2.29X versus current state-of-the-art data pipeline optimizers while also improving both CPU & GPU utilization.

引用

页码：430 / 442

页数：13

共 50 条

[41] Study on recommendation of personalised learning resources based on deep reinforcement learning
Li Z.
Wang H.
International Journal of Information and Communication Technology, 2023, 23 (04) : 299 - 313
[42] Drug-based recommendation system based on deep learning approach for data optimization
Vianny, D. Maria Manuel
Vaddadi, Srinivas Aditya
Karthikeyan, C.
Shahid, Mohammad
Dhanapal, R.
Ravichand, M.
SOFT COMPUTING, 2023,
[43] Deep Learning-based Pipeline to Recognize Alzheimer's Disease using fMRI Data
Sarraf, Saman
Tofighi, Ghassem
PROCEEDINGS OF 2016 FUTURE TECHNOLOGIES CONFERENCE (FTC), 2016, : 816 - 820
[44] Deep Reinforcement Learning for Medicine Recommendation
Symeonidis, Panagiotis
Chairistanidis, Stergios
Zanker, Markus
2022 IEEE 22ND INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE 2022), 2022, : 85 - 90
[45] Deep reinforcement learning-based robust missile guidance
Ahn, Jeongsu
Shin, Jongho
Kim, Hyeong-Geun
2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 927 - 930
[46] A Deep Reinforcement Learning-Based Approach in Porker Game
Kong, Yan
Rui, Yefeng
Hsia, Chih-Hsien
Journal of Computers (Taiwan), 2023, 34 (02) : 41 - 51
[47] A Deep Reinforcement Learning-Based Framework for Content Caching
Zhong, Chen
Gursoy, M. Cenk
Velipasalar, Senem
2018 52ND ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2018,
[48] A novel optimization framework for natural gas transportation pipeline networks based on deep reinforcement learning
Liu, Zemin Eitan
Long, Wennan
Chen, Zhenlin
Littlefield, James
Jing, Liang
Ren, Bo
El-Houjeiri, Hassan M.
Qahtani, Amjaad S.
Jabbar, Muhammad Y.
Masnadi, Mohammad S.
Energy and AI, 18
[49] Deep Reinforcement Learning-based Traffic Signal Control
Ruan, Junyun
Tang, Jinzhuo
Gao, Ge
Shi, Tianyu
Khamis, Alaa
2023 IEEE INTERNATIONAL CONFERENCE ON SMART MOBILITY, SM, 2023, : 21 - 26
[50] Deep reinforcement learning-based antilock braking algorithm
Mantripragada, V. Krishna Teja
Kumar, R. Krishna
VEHICLE SYSTEM DYNAMICS, 2023, 61 (05) : 1410 - 1431

← 1 2 3 4 5 →