Locality-Aware Scheduling of Independent Tasks for Runtime Systems

被引：2

作者：

Gonthier, Maxime ^{[1
,2
]}

Marchal, Loris ^{[1
,2
]}

Thibault, Samuel ^{[3
]}

机构：

[1] ENS Lyon, LIP, CNRS, INRIA, Lyon, France

[2] Univ Claude Bernard Lyon 1, Lyon, France

[3] Univ Bordeaux, CNRS, LaBRI, Inria Bordeaux Sud Ouest, Talence, France

来源：

EURO-PAR 2021: PARALLEL PROCESSING WORKSHOPS | 2022年 / 13098卷

关键词：

Memory-aware scheduling; Eviction policy; Tasks sharing data; Runtime systems;

D O I：

10.1007/978-3-031-06156-1_1

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

A now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to order tasks and prefetch their input data in the GPU memory (after possibly evicting some previously-loaded data), while aiming at minimizing data movements, so as to reduce the total processing time. In this paper, we focus on how to schedule tasks that share some of their input data (but are otherwise independent) on a GPU. We provide a formal model of the problem, exhibit an optimal eviction strategy, and show that ordering tasks to minimize data movement is NP-complete. We review and adapt existing ordering strategies to this problem, and propose a new one based on task aggregation. These strategies have been implemented in the STARPU runtime system. We present their performance on tasks from tiled 2D and 3D matrix products. We present their performance on tasks from tiled 2D, 3D matrix products. Our experiments demonstrate that using our new strategy together with the optimal eviction policy reduces the amount of data movement as well as the total processing time.

引用

页码：5 / 16

页数：12

共 50 条

[21] Locality-aware Partitioning in Parallel Database Systems
Zamanian, Erfan
Binnig, Carsten
Salama, Abdallah
SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 17 - 30
[22] BOLAS plus : Scalable Lightweight Locality-aware Scheduling for Hadoop
Gao, Shengli
Xue, Ruini
2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1077 - 1084
[23] Pandas: Robust Locality-Aware Scheduling With Stochastic Delay Optimality
Xie, Qiaomin
Pundir, Mayank
Lu, Yi
Abad, Cristina L.
Campbell, Roy H.
IEEE-ACM TRANSACTIONS ON NETWORKING, 2017, 25 (02) : 662 - 675
[24] On the Merits of Distributed Work-stealing on Selective Locality-aware Tasks
Paudel, Jeeva
Tardieu, Olivier
Amaral, Jose Nelson
2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 100 - 109
[25] Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors
Muddukrishna, Ananya
Jonsson, Peter A.
Brorsson, Mats
SCIENTIFIC PROGRAMMING, 2015, 2015
[26] Locality-aware and load-balanced static task scheduling for MapReduce
Selvitopi, Oguz
Demirci, Gunduz Vehbi
Turk, Ata
Aykanat, Cevdet
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 90 : 49 - 61
[27] Locality-Aware Crowd Counting
Zhou, Joey Tianyi
Le Zhang
Du Jiawei
Xi Peng
Fang, Zhiwen
Zhe Xiao
Zhu, Hongyuan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) : 3602 - 3613
[28] BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less Queuing
Wu, Qinzhe
Li, Ruihao
Beard, Jonathan
John, Lizy
PROCEEDINGS OF THE 33RD ACM SIGPLAN INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION, CC 2024, 2024, : 100 - 112
[29] LAS: Locality-Aware Scheduling for GEMM-Accelerated Convolutions in GPUs
Kim, Hyeonjin
Song, William J.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (05) : 1479 - 1494
[30] Locality-Aware Vertex Scheduling for GPU-based Graph Computation
Park, Hyunsun
Ahn, Junwhan
Park, Eunhyeok
Yoo, Sungjoo
2015 IFIP/IEEE INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2015, : 195 - 200

← 1 2 3 4 5 →