Locality-Aware Scheduling of Independent Tasks for Runtime Systems

被引:2
|
作者
Gonthier, Maxime [1 ,2 ]
Marchal, Loris [1 ,2 ]
Thibault, Samuel [3 ]
机构
[1] ENS Lyon, LIP, CNRS, INRIA, Lyon, France
[2] Univ Claude Bernard Lyon 1, Lyon, France
[3] Univ Bordeaux, CNRS, LaBRI, Inria Bordeaux Sud Ouest, Talence, France
关键词
Memory-aware scheduling; Eviction policy; Tasks sharing data; Runtime systems;
D O I
10.1007/978-3-031-06156-1_1
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to order tasks and prefetch their input data in the GPU memory (after possibly evicting some previously-loaded data), while aiming at minimizing data movements, so as to reduce the total processing time. In this paper, we focus on how to schedule tasks that share some of their input data (but are otherwise independent) on a GPU. We provide a formal model of the problem, exhibit an optimal eviction strategy, and show that ordering tasks to minimize data movement is NP-complete. We review and adapt existing ordering strategies to this problem, and propose a new one based on task aggregation. These strategies have been implemented in the STARPU runtime system. We present their performance on tasks from tiled 2D and 3D matrix products. We present their performance on tasks from tiled 2D, 3D matrix products. Our experiments demonstrate that using our new strategy together with the optimal eviction policy reduces the amount of data movement as well as the total processing time.
引用
收藏
页码:5 / 16
页数:12
相关论文
共 50 条
  • [21] Locality-aware Partitioning in Parallel Database Systems
    Zamanian, Erfan
    Binnig, Carsten
    Salama, Abdallah
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 17 - 30
  • [22] BOLAS plus : Scalable Lightweight Locality-aware Scheduling for Hadoop
    Gao, Shengli
    Xue, Ruini
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1077 - 1084
  • [23] Pandas: Robust Locality-Aware Scheduling With Stochastic Delay Optimality
    Xie, Qiaomin
    Pundir, Mayank
    Lu, Yi
    Abad, Cristina L.
    Campbell, Roy H.
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2017, 25 (02) : 662 - 675
  • [24] On the Merits of Distributed Work-stealing on Selective Locality-aware Tasks
    Paudel, Jeeva
    Tardieu, Olivier
    Amaral, Jose Nelson
    2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 100 - 109
  • [25] Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors
    Muddukrishna, Ananya
    Jonsson, Peter A.
    Brorsson, Mats
    SCIENTIFIC PROGRAMMING, 2015, 2015
  • [26] Locality-aware and load-balanced static task scheduling for MapReduce
    Selvitopi, Oguz
    Demirci, Gunduz Vehbi
    Turk, Ata
    Aykanat, Cevdet
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 90 : 49 - 61
  • [27] Locality-Aware Crowd Counting
    Zhou, Joey Tianyi
    Le Zhang
    Du Jiawei
    Xi Peng
    Fang, Zhiwen
    Zhe Xiao
    Zhu, Hongyuan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) : 3602 - 3613
  • [28] BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less Queuing
    Wu, Qinzhe
    Li, Ruihao
    Beard, Jonathan
    John, Lizy
    PROCEEDINGS OF THE 33RD ACM SIGPLAN INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION, CC 2024, 2024, : 100 - 112
  • [29] LAS: Locality-Aware Scheduling for GEMM-Accelerated Convolutions in GPUs
    Kim, Hyeonjin
    Song, William J.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (05) : 1479 - 1494
  • [30] Locality-Aware Vertex Scheduling for GPU-based Graph Computation
    Park, Hyunsun
    Ahn, Junwhan
    Park, Eunhyeok
    Yoo, Sungjoo
    2015 IFIP/IEEE INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2015, : 195 - 200