Locality-Aware Scheduling of Independent Tasks for Runtime Systems

被引:2
|
作者
Gonthier, Maxime [1 ,2 ]
Marchal, Loris [1 ,2 ]
Thibault, Samuel [3 ]
机构
[1] ENS Lyon, LIP, CNRS, INRIA, Lyon, France
[2] Univ Claude Bernard Lyon 1, Lyon, France
[3] Univ Bordeaux, CNRS, LaBRI, Inria Bordeaux Sud Ouest, Talence, France
关键词
Memory-aware scheduling; Eviction policy; Tasks sharing data; Runtime systems;
D O I
10.1007/978-3-031-06156-1_1
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to order tasks and prefetch their input data in the GPU memory (after possibly evicting some previously-loaded data), while aiming at minimizing data movements, so as to reduce the total processing time. In this paper, we focus on how to schedule tasks that share some of their input data (but are otherwise independent) on a GPU. We provide a formal model of the problem, exhibit an optimal eviction strategy, and show that ordering tasks to minimize data movement is NP-complete. We review and adapt existing ordering strategies to this problem, and propose a new one based on task aggregation. These strategies have been implemented in the STARPU runtime system. We present their performance on tasks from tiled 2D and 3D matrix products. We present their performance on tasks from tiled 2D, 3D matrix products. Our experiments demonstrate that using our new strategy together with the optimal eviction policy reduces the amount of data movement as well as the total processing time.
引用
收藏
页码:5 / 16
页数:12
相关论文
共 50 条
  • [31] Minimizing Network Traffic for Distributed Joins Using Lightweight Locality-Aware Scheduling
    Cheng, Long
    Murphy, John
    Liu, Qingzhi
    Hao, Chunliang
    Theodoropoulos, Georgios
    EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 293 - 305
  • [32] Zeus: Locality-aware Distributed Transactions
    Katsarakis, Antonios
    Ma, Yijun
    Tan, Zhaowei
    Bainbridge, Andrew
    Balkwill, Matthew
    Dragojevic, Aleksandar
    Grot, Boris
    Radunovic, Bozidar
    Zhang, Yongguang
    PROCEEDINGS OF THE SIXTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '21), 2021, : 145 - 161
  • [33] Locality-aware fountain codes for massive distributed storage systems
    Okpotse, Toritseju
    Yousefi, Shahram
    2015 IEEE 14TH CANADIAN WORKSHOP ON INFORMATION THEORY (CWIT), 2015, : 18 - 21
  • [34] A Locality-Aware Compression Scheme for Highly Reliable Embedded Systems
    Hong, Juhyung
    Kim, Jeongbin
    Han, Sangwoo
    Chung, Eui-Young
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (03) : 453 - 465
  • [35] Balancing Job Performance with System Performance via Locality-Aware Scheduling on Torus-Connected Systems
    Yang, Xu
    Zhou, Zhou
    Tang, Wei
    Zheng, Xingwu
    Wang, Jia
    Lan, Zhiling
    2014 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2014, : 140 - 148
  • [36] Enhancing Content Distribution Performance of Locality-aware BitTorrent Systems
    Li, Zhenyu
    Xie, Gaogang
    2010 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE GLOBECOM 2010, 2010,
  • [37] Locality-aware policies to improve job scheduling on 3D tori
    Jose A. Pascual
    Jose Miguel-Alonso
    Jose A. Lozano
    The Journal of Supercomputing, 2015, 71 : 966 - 994
  • [38] Locality-aware policies to improve job scheduling on 3D tori
    Pascual, Jose A.
    Miguel-Alonso, Jose
    Lozano, Jose A.
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (03): : 966 - 994
  • [39] An Locality-Aware Scheduling Based on a Novel Scheduling Model to Improve System Throughput of MapReduce Cluster
    Zhao, Hui
    Yang, Shuqiang
    Chen, Zhikun
    Yin, Hong
    Jin, Songchang
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 111 - 115
  • [40] Locality-Aware Laplacian Mesh Smoothing
    Aupy, Guillaume
    Park, JeongHyung
    Raghavan, Padma
    PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016, 2016, : 588 - 597