InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models

被引：1

作者：

Nagrecha, Kabir ^{[1
]}

Liu, Lingyi ^{[1
]}

Delgado, Pablo ^{[1
]}

Padmanabhan, Prasanna ^{[1
]}

机构：

[1] Netflix Inc, Los Gatos, CA 95032 USA

来源：

PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023 | 2023年

关键词：

data processing; recommendation systems; deep learning; parallel computing; resource allocation;

D O I：

10.1145/3604915.3608778

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning-based recommender models (DLRMs) have become an essential component of many modern recommender systems. Several companies are nowbuilding large compute clusters reserved only for DLRM training, driving new interest in cost- & time- saving optimizations. The systems challenges faced in this setting are unique; while typical deep learning (DL) training jobs are dominated by model execution times, the most important factor in DLRM training performance is often online data ingestion. In this paper, we explore the unique characteristics of this data ingestion problem and provide insights into the specific bottlenecks and challenges of the DLRM training pipeline at scale. We study real-world DLRM data processing pipelines taken from our compute cluster at Netflix to both observe the performance impacts of online ingestion and to identify shortfalls in existing data pipeline optimizers. We find that current tooling either yields sub-optimal performance, frequent crashes, or else requires impractical cluster re-organization to adopt. Our studies lead us to design and build a new solution for data pipeline optimization, InTune. InTune employs a reinforcement learning (RL) agent to learn how to distribute the CPU resources of a trainer machine across a DLRM data pipeline to more effectively parallelize data-loading and improve throughput. Our experiments show that InTune can build an optimized data pipeline configuration within only a few minutes, and can easily be integrated into existing training workflows. By exploiting the responsiveness and adaptability of RL, InTune achieves significantly higher online data ingestion rates than existing optimizers, thus reducing idle times in model execution and increasing efficiency. We apply InTune to our real-world cluster, and find that it increases data ingestion throughput by as much as 2.29X versus current state-of-the-art data pipeline optimizers while also improving both CPU & GPU utilization.

引用

下载

页码：430 / 442

页数：13

共 50 条

[21] Deep Reinforcement Learning-Based Optimization for Crew Allocation in Modular Building Prefabrication
Deria, Anisha
Lee, Yong-Cheol
Ghannad, Pedram
CONSTRUCTION RESEARCH CONGRESS 2024: ADVANCED TECHNOLOGIES, AUTOMATION, AND COMPUTER APPLICATIONS IN CONSTRUCTION, 2024, : 1317 - 1326
[22] Enhancing overall performance of thermophotovoltaics via deep reinforcement learning-based optimization
Yu, Shilv
Chen, Zihe
Liao, Wentao
Yuan, Cheng
Shang, Bofeng
Hu, Run
JOURNAL OF APPLIED PHYSICS, 2024, 136 (02)
[23] Deep Reinforcement Learning-Based Offloading Decision Optimization in Mobile Edge Computing
Zhang, Hao
Wu, Wenjun
Wang, Chaoyi
Li, Meng
Yang, Ruizhe
2019 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2019,
[24] Deep Reinforcement Learning-based Energy Efficiency Optimization For Flying LoRa Gateways
Jouhari, Mohammed
Ibrahimi, Khalil
Ben Othman, Jalel
Amhoud, El Mehdi
ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 6157 - 6162
[25] A data interpretation approach for deep learning-based prediction models
Dadsetan, Saba
Wu, Shandong
MEDICAL IMAGING 2019: IMAGING INFORMATICS FOR HEALTHCARE, RESEARCH, AND APPLICATIONS, 2019, 10954
[26] DEEP REINFORCEMENT LEARNING-BASED IRRIGATION SCHEDULING
Yang, Y.
Hu, J.
Porter, D.
Marek, T.
Heflin, K.
Kong, H.
Sun, L.
TRANSACTIONS OF THE ASABE, 2020, 63 (03) : 549 - 556
[27] Reinforcement Learning-Based Recommendation with User Reviews on Knowledge Graphs
Zhang, Siyuan
Ouyang, Yuanxin
Liu, Zhuang
He, Weijie
Rong, Wenge
Xiong, Zhang
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, KSEM 2023, 2023, 14119 : 148 - 159
[28] Efficient and Accurate Leakage Points Detection in Gas Pipeline Using Reinforcement Learning-Based Optimization
He, Qinglin
Zhou, Lianjie
Zhang, Feng
Guan, Dongjie
Zhang, Xiang
IEEE SENSORS JOURNAL, 2024, 24 (17) : 27640 - 27652
[29] Improving Deep Learning-Based Recommendation Attack Detection Using Harris Hawks Optimization
Zhou, Quanqiang
Huang, Cheng
Duan, Liangliang
APPLIED SCIENCES-BASEL, 2022, 12 (19):
[30] DeepSet: Deep Learning-based Recommendation with Setwise Preference
Li, Lin
Pan, Weike
Chen, Guanliang
Ming, Zhong
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,

← 1 2 3 4 5 →