DAGuE: A generic distributed DAG engine for High Performance Computing

被引:136
|
作者
Bosilca, George [1 ]
Bouteiller, Aurelien [1 ]
Danalis, Anthony [1 ]
Herault, Thomas [1 ]
Lemarinier, Pierre [2 ]
Dongarra, Jack [1 ,3 ]
机构
[1] Univ Tennessee, Innovat Comp Lab, Knoxville, TN 37996 USA
[2] Univ Rennes 1, IRISA, F-35014 Rennes, France
[3] Oak Ridge Natl Lab, Oak Ridge, TN 37831 USA
关键词
HPC; Micro-task DAG; Heterogeneous architectures; Architecture aware scheduling; FACTORIZATION;
D O I
10.1016/j.parco.2011.10.003
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The frenetic development of the current architectures places a strain on the current state-of-the-art programming environments. Harnessing the full potential of such architectures is a tremendous task for the whole scientific computing community. We present DAGuE a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. Applications we consider can be expressed as a Direct Acyclic Graph of tasks with labeled edges designating data dependencies. DAGs are represented in a compact, problem-size independent format that can be queried on-demand to discover data dependencies, in a totally distributed fashion. DAGuE assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on cache awareness, data-locality and task priority. We demonstrate the efficiency of our approach, using several micro-benchmarks to analyze the performance of different components of he framework, and a linear algebra factorization as a use case. Published by Elsevier E.V.
引用
下载
收藏
页码:37 / 51
页数:15
相关论文
共 50 条
  • [41] Wide-area distributed applications in high performance computing
    Overeinder, BJ
    Sips, HJ
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE, 2001, 17 (06): : 767 - 768
  • [42] An integrated graphical user interface for high performance distributed computing
    Shen, XH
    Liao, WK
    Choudhary, A
    2001 INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2001, : 237 - 242
  • [43] Application service providing for distributed high-performance computing
    Lee, CK
    Hochberger, C
    Tavangarian, D
    HIGH PERFORMANCE COMPUTING SYSTEMS AND APPLICATIONS, 2003, 727 : 119 - 128
  • [44] Special Issue: Grid Computing, High Performance and Distributed Application
    Perez, Maria S.
    Herrero, Pilar
    Gannon, Dennis
    Katz, Daniel S.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2010, 22 (11): : 1335 - 1337
  • [45] A survey on resource allocation in high performance distributed computing systems
    Hussain, Hameed
    Malik, Saif Ur Rehman
    Hameed, Abdul
    Khan, Samee Ullah
    Bickler, Gage
    Min-Allah, Nasro
    Qureshi, Muhammad Bilal
    Zhang, Limin
    Wang Yongji
    Ghani, Nasir
    Kolodziej, Joanna
    Zomaya, Albert Y.
    Xu, Cheng-Zhong
    Balaji, Pavan
    Vishnu, Abhinav
    Pinel, Fredric
    Pecero, Johnatan E.
    Kliazovich, Dzmitry
    Bouvry, Pascal
    Li, Hongxiang
    Wang, Lizhe
    Chen, Dan
    Rayes, Ammar
    PARALLEL COMPUTING, 2013, 39 (11) : 709 - 736
  • [46] A secure communications infrastructure for high-performance distributed computing
    Foster, I
    Karonis, NT
    Kesselman, C
    Koenig, G
    Tuecke, S
    SIXTH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, PROCEEDINGS, 1997, : 125 - 136
  • [47] Applications of Distributed and High Performance Computing to Enhance Online Education
    Caballe, Santi
    Li, Wei
    Hoseiny, Reza
    Zomaya, Albert
    Xhafa, Fatos
    ADVANCES ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING (3PGCIC-2017), 2018, 13 : 586 - 600
  • [48] Design of a Distributed Coupling Toolkit for High Performance Computing environment
    De Cecchis, D.
    Drummond, L. A.
    Castillo, J. E.
    MATHEMATICAL AND COMPUTER MODELLING, 2013, 57 (9-10) : 2267 - 2278
  • [49] HIGH-PERFORMANCE INFORMATION PROCESSING IN DISTRIBUTED COMPUTING SYSTEMS
    Skliarov, Valery K
    Rjabov, Artjom
    Skliarova, Iouliia
    Sudnitson, Alexander
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2016, 12 (01): : 139 - 160
  • [50] COMMUNICATION-SYSTEM FOR HIGH-PERFORMANCE DISTRIBUTED COMPUTING
    HARIRI, S
    PARK, JB
    PARASHAR, M
    FOX, GC
    CONCURRENCY-PRACTICE AND EXPERIENCE, 1994, 6 (04): : 251 - 270