DAGuE: A generic distributed DAG engine for High Performance Computing

被引:136
|
作者
Bosilca, George [1 ]
Bouteiller, Aurelien [1 ]
Danalis, Anthony [1 ]
Herault, Thomas [1 ]
Lemarinier, Pierre [2 ]
Dongarra, Jack [1 ,3 ]
机构
[1] Univ Tennessee, Innovat Comp Lab, Knoxville, TN 37996 USA
[2] Univ Rennes 1, IRISA, F-35014 Rennes, France
[3] Oak Ridge Natl Lab, Oak Ridge, TN 37831 USA
关键词
HPC; Micro-task DAG; Heterogeneous architectures; Architecture aware scheduling; FACTORIZATION;
D O I
10.1016/j.parco.2011.10.003
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The frenetic development of the current architectures places a strain on the current state-of-the-art programming environments. Harnessing the full potential of such architectures is a tremendous task for the whole scientific computing community. We present DAGuE a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. Applications we consider can be expressed as a Direct Acyclic Graph of tasks with labeled edges designating data dependencies. DAGs are represented in a compact, problem-size independent format that can be queried on-demand to discover data dependencies, in a totally distributed fashion. DAGuE assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on cache awareness, data-locality and task priority. We demonstrate the efficiency of our approach, using several micro-benchmarks to analyze the performance of different components of he framework, and a linear algebra factorization as a use case. Published by Elsevier E.V.
引用
下载
收藏
页码:37 / 51
页数:15
相关论文
共 50 条
  • [31] High Performance Computing Design by Code Migration for Distributed Desktop Computing Grids
    Yoshida, Makoto
    Kojima, Kazumine
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2011, 3 (04) : 53 - 70
  • [32] Lightweight distributed computing framework for orchestrating high performance computing and big data
    Ince, Muhammed Numan
    Gunay, Melih
    Ledet, Joseph
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2022, 30 (04) : 1571 - 1585
  • [33] Grid computing: The future of distributed computing for high performance scientific and business applications
    Mukherjee, S
    Mustafi, J
    Chaudhuri, A
    DISTRIBUTED COMPUTING, PROCEEDINGS: MOBILE AND WIRELESS COMPUTING, 2002, 2571 : 339 - 342
  • [34] Dragon: A Lightweight, High Performance Distributed Stream Processing Engine
    Harwood, Aaron
    Read, Maria Rodriguez
    Amarasinghe, Gayashan Niroshana
    2020 IEEE 40TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2020, : 1344 - 1351
  • [35] DtCraft: A High-Performance Distributed Execution Engine at Scale
    Huang, Tsung-Wei
    Lin, Chun-Xun
    Wong, Martin D. F.
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (06) : 1070 - 1083
  • [36] FSP Modeling of a Generic Distributed Swarm Computing Framework
    Badica, Amelia
    Badica, Costin
    Brezovan, Marius
    INTELLIGENT DISTRIBUTED COMPUTING IX, IDC'2015, 2016, 616 : 177 - 186
  • [37] A generic deployment framework for grid computing and distributed applications
    Flissi, Areski
    Merle, Philippe
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2006: COOPIS, DOA, GADA, AND ODBASE PT 2, PROCEEDINGS, 2006, 4276 : 1402 - 1411
  • [38] A Generic Distributed Algorithm for Computing by Random Mobile Agents
    Abbas, Shehla
    Mosbah, Mohamed
    Zemmari, Akka
    AGENT COMPUTING AND MULTI-AGENT SYSTEMS, 2009, 5044 : 392 - 397
  • [39] The RPS2 generic distributed computing framework
    Douglas, RE
    Jackson, RE
    ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS V, 1996, 101 : 455 - 458
  • [40] Performance under Failures of DAG-based Parallel Computing
    Jin, Hui
    Sun, Xian-He
    Zheng, Ziming
    Lan, Zhiling
    Xie, Bing
    CCGRID: 2009 9TH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, 2009, : 236 - 243