Taming data locality for task scheduling under memory constraint in runtime systems

被引：1

作者：

Gonthier, Maxime ^{[1
]}

Marchal, Loris ^{[2
]}

Thibault, Samuel ^{[3
]}

机构：

[1] Inria, LIP, ENS Lyon, LaBRI, 200 Ave Vieille Tour, F-33405 Talence, Nouvelle Aquita, France

[2] Univ Claude Bernard Lyon 1, LIP, ENS Lyon, Inria,CNRS, 46 Allee Italie, F-69007 Lyon, Auvergne Rhone, France

[3] Univ Bordeaux, LaBRI, Inria Bordeaux Sud Ouest, CNRS, 200 Ave Vieille Tour, F-33405 Talence, Nouvelle Aquita, France

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2023年 / 143卷

关键词：

Memory-aware scheduling; Eviction policy; Tasks sharing data; GPUs; Runtime systems; Memory constraint; ALGORITHMS;

D O I：

10.1016/j.future.2023.01.024

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

A now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task -based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is possible to produce a tasks processing order aiming at reducing the total processing time through three objectives: minimizing data transfers, overlapping transfers and computation and optimizing the eviction of previously-loaded data. In this paper, we focus on how to schedule tasks that share some of their input data (but are otherwise independent) on a single GPU. We provide a formal model of the problem, exhibit an optimal eviction strategy, and show that ordering tasks to minimize data movement is NP-complete. We review and adapt existing ordering strategies to this problem, and propose a new one based on task aggregation. We prove that the underlying problem of this new strategy is NP-complete, and prove the reasonable complexity of our proposed heuristic. These strategies have been implemented in the STARPU runtime system. We present their performance on tasks from tiled 2D, 3D matrix products, Cholesky factorization, randomized task order, randomized data pairs from the 2D matrix product as well as a sparse matrix product. We introduce a visual way to understand these performance and lower bounds on the number of data loads for the 2D and 3D matrix products. Our experiments demonstrate that using our new strategy together with the optimal eviction policy reduces the amount of data movement as well as the total processing time.(c) 2023 Elsevier B.V. All rights reserved.

引用

页码：305 / 321

页数：17

共 50 条

[1] Taming Big Data SVM with Locality-Aware Scheduling
Ye, Mao
Wang, Jun
Yin, Jiangling
Han, Dezhi
2016 FOURTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD 2016), 2016, : 37 - 44
[2] Efficient task scheduling for runtime reconfigurable systems
Fazlali, Mahmood
Sabeghi, Mojtaba
Zakerolhosseini, Ali
Bertels, Koen
JOURNAL OF SYSTEMS ARCHITECTURE, 2010, 56 (11) : 623 - 632
[3] Locality-Aware Scheduling of Independent Tasks for Runtime Systems
Gonthier, Maxime
Marchal, Loris
Thibault, Samuel
EURO-PAR 2021: PARALLEL PROCESSING WORKSHOPS, 2022, 13098 : 5 - 16
[4] Runtime Hardware/Software Task Transition Scheduling for Data-Adaptable Embedded Systems
Sandoval, Nathan
Mackin, Casey
Whitsitt, Sean
Lysecky, Roman
Sprinkle, Jonathan
PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), 2013, : 342 - 345
[5] A Heuristic Method for Data Allocation and Task Scheduling on Heterogeneous Multiprocessor Systems Under Memory Constraints
Ding, Junwen
Song, Liangcai
Li, Siyuan
Wu, Chen
He, Ronghua
Su, Zhouxing
Lu, Zhipeng
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT II, 2024, 14488 : 360 - 380
[6] An improved task scheduling algorithm based on cache locality and data locality in Hadoop
Zhang, Peng
Li, Chunlin
Zhao, Yahui
2016 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2016, : 244 - 249
[7] Overcoming data locality: An in-memory runtime file system with symmetrical data distribution
Uta, Alexandru
Sandu, Andreea
Kielmann, Thilo
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 54 : 144 - 158
[8] Locality Aware Task Scheduling in Parallel Data Stream Processing
Falt, Zbynek
Krulis, Martin
Bednarek, David
Yaghob, Jakub
Zavoral, Filip
INTELLIGENT DISTRIBUTED COMPUTING VIII, 2015, 570 : 331 - 342
[9] Memory-Aware Scheduling of Tasks Sharing Data on Multiple GPUs with Dynamic Runtime Systems
Gonthier, Maxime
Marchal, Loris
Thibault, Samuel
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 694 - 704
[10] Task assignment and scheduling under memory constraints
Szymanek, R
Kuchcinski, K
PROCEEDINGS OF THE 26TH EUROMICRO CONFERENCE, VOLS I AND II, 2000, : 84 - 90

← 1 2 3 4 5 →