DAGuE: A generic distributed DAG engine for High Performance Computing

被引:136
|
作者
Bosilca, George [1 ]
Bouteiller, Aurelien [1 ]
Danalis, Anthony [1 ]
Herault, Thomas [1 ]
Lemarinier, Pierre [2 ]
Dongarra, Jack [1 ,3 ]
机构
[1] Univ Tennessee, Innovat Comp Lab, Knoxville, TN 37996 USA
[2] Univ Rennes 1, IRISA, F-35014 Rennes, France
[3] Oak Ridge Natl Lab, Oak Ridge, TN 37831 USA
关键词
HPC; Micro-task DAG; Heterogeneous architectures; Architecture aware scheduling; FACTORIZATION;
D O I
10.1016/j.parco.2011.10.003
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The frenetic development of the current architectures places a strain on the current state-of-the-art programming environments. Harnessing the full potential of such architectures is a tremendous task for the whole scientific computing community. We present DAGuE a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. Applications we consider can be expressed as a Direct Acyclic Graph of tasks with labeled edges designating data dependencies. DAGs are represented in a compact, problem-size independent format that can be queried on-demand to discover data dependencies, in a totally distributed fashion. DAGuE assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on cache awareness, data-locality and task priority. We demonstrate the efficiency of our approach, using several micro-benchmarks to analyze the performance of different components of he framework, and a linear algebra factorization as a use case. Published by Elsevier E.V.
引用
下载
收藏
页码:37 / 51
页数:15
相关论文
共 50 条
  • [1] High Performance Distributed Computing
    Kumari, Sneha
    2013 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATIONS AND NETWORKING TECHNOLOGIES (ICCCNT), 2013,
  • [2] HIGH-PERFORMANCE DISTRIBUTED COMPUTING
    RAGHAVENDRA, CS
    CONCURRENCY-PRACTICE AND EXPERIENCE, 1994, 6 (04): : 231 - 233
  • [3] High performance distributed computing: An introduction
    Kowalik, Janusz S.
    Abarbanel, Robert M.
    Studies in Health Technology and Informatics, 2000, 79 : 187 - 194
  • [4] A High Performance Computing Web Search Engine Based on Big Data and Parallel Distributed Models
    Ma, Jun
    Informatica (Slovenia), 2024, 48 (20): : 27 - 38
  • [5] High Performance Computing with the Cell Broadband Engine
    Gschwind, Michael
    Gustavson, Fred
    Prins, Jan F.
    SCIENTIFIC PROGRAMMING, 2009, 17 (1-2) : 1 - 2
  • [6] A High Performance SOAP Engine for Grid Computing
    Wang, Ning
    Welzl, Michael
    Zhang, Liang
    NETWORKS FOR GRID APPLICATIONS, 2009, 2 : 1 - +
  • [7] Special issue on high performance distributed computing
    Hariri, S
    JOURNAL OF SUPERCOMPUTING, 1997, 11 (02): : 99 - 99
  • [8] High Performance Computing for Distributed Sensing Applications
    Bal, Henri
    2019 18TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC 2019), 2019, : XIV - XV
  • [9] A High Performance MPI for Parallel and Distributed Computing
    Prabu, D.
    Vanamala, V.
    Deka, Sanjeeb Kumar
    Sridharan, R.
    Prahlada, Rao B. B.
    Mohanrarn, N.
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 17, 2006, 17 : 310 - 313
  • [10] ATM High Performance Distributed Computing Laboratory
    Hariri, S
    Bing, X
    Chen, TT
    Furmanski, W
    Ganesh, K
    Kim, DM
    Kim, YH
    Lee, JH
    Menon, VV
    Park, SY
    Ra, I
    Saxena, S
    Selvakumar, P
    Srinivasaraghavan, R
    Topcuoglu, H
    Wei, W
    Ye, BQ
    Zhou, LY
    WORKSHOP ON HIGH PERFORMANCE COMPUTING AND GIGABIT LOCAL AREA NETWORKS, 1997, 226 : 50 - 74