DAGuE: A generic distributed DAG engine for High Performance Computing

被引:136
|
作者
Bosilca, George [1 ]
Bouteiller, Aurelien [1 ]
Danalis, Anthony [1 ]
Herault, Thomas [1 ]
Lemarinier, Pierre [2 ]
Dongarra, Jack [1 ,3 ]
机构
[1] Univ Tennessee, Innovat Comp Lab, Knoxville, TN 37996 USA
[2] Univ Rennes 1, IRISA, F-35014 Rennes, France
[3] Oak Ridge Natl Lab, Oak Ridge, TN 37831 USA
关键词
HPC; Micro-task DAG; Heterogeneous architectures; Architecture aware scheduling; FACTORIZATION;
D O I
10.1016/j.parco.2011.10.003
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The frenetic development of the current architectures places a strain on the current state-of-the-art programming environments. Harnessing the full potential of such architectures is a tremendous task for the whole scientific computing community. We present DAGuE a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. Applications we consider can be expressed as a Direct Acyclic Graph of tasks with labeled edges designating data dependencies. DAGs are represented in a compact, problem-size independent format that can be queried on-demand to discover data dependencies, in a totally distributed fashion. DAGuE assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on cache awareness, data-locality and task priority. We demonstrate the efficiency of our approach, using several micro-benchmarks to analyze the performance of different components of he framework, and a linear algebra factorization as a use case. Published by Elsevier E.V.
引用
下载
收藏
页码:37 / 51
页数:15
相关论文
共 50 条
  • [21] A large scale distributed platform for high performance computing
    Abdennadher, N
    Boesch, R
    GRID AND COOPERATIVE COMPUTING - GCC 2005, PROCEEDINGS, 2005, 3795 : 848 - 859
  • [22] Exploring Untrusted Distributed Storage for High Performance Computing
    Smith, Austin
    Riley, Justin
    Syed, Muneeba
    Kupcevic, Milan
    Edmon, Paul
    Yockel, Scott
    PEARC '19: PROCEEDINGS OF THE PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING ON RISE OF THE MACHINES (LEARNING), 2019,
  • [23] Distributed High Performance Computing using JAVA']JAVA
    Shakya, Subarna
    Chaulagain, Ram Sharan
    Pandey, Santosh
    Gyawali, Prakash
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2017, : 742 - 747
  • [24] Editorial for the special issue on high performance distributed computing
    Minyi Guo
    Guihai Chen
    Xiaofei Liao
    Long Zheng
    CCF Transactions on High Performance Computing, 2021, 3 : 127 - 128
  • [25] Editorial for the special issue on high performance distributed computing
    Guo, Minyi
    Chen, Guihai
    Liao, Xiaofei
    Zheng, Long
    CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2021, 3 (02) : 127 - 128
  • [26] Scheduling of Distributed Applications for High Performance Computing as a Service
    Bak, Slawomir
    Czarnecki, Radoslaw
    Deniziak, Stanislaw
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING 2016 (ICCMSE-2016), 2016, 1790
  • [27] A Leveled Dag Critical Task Firstschedule Algorithm in Distributed Computing Systems
    El-Nattat, Amal
    El-Bahnasawy, Nirmeen A.
    El-Sayed, Ayman
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (01) : 274 - 284
  • [28] Evaluating High-Performance Computing on Google App Engine
    Prodan, Radu
    Sperk, Michael
    Ostermann, Simon
    IEEE SOFTWARE, 2012, 29 (02) : 52 - 58
  • [29] DAGBENCH: A Performance Evaluation Framework for DAG Distributed Ledgers
    Dong, Zhongli
    Zheng, Emma
    Lee, Young Choon
    Zomaya, Albert Y.
    2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019), 2019, : 264 - 271
  • [30] Machine Learning for Generic Energy Models of High Performance Computing Resources
    Murana, Jonathan
    Navarrete, Carmen
    Nesmachnow, Sergio
    HIGH PERFORMANCE COMPUTING - ISC HIGH PERFORMANCE DIGITAL 2021 INTERNATIONAL WORKSHOPS, 2021, 12761 : 314 - 330