A Data-aware Partitioning and Optimization Method for Large-scale Workflows in Hybrid Computing Environments

被引:0
|
作者
Duan, Rubing [1 ]
Li, Xiaorong [1 ]
机构
[1] ASTAR, Inst High Performance Comp, Singapore, Singapore
关键词
D O I
10.1109/ICPADS.2013.29
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
While hybrid computing environments provide good potential for achieving high performance and low economic cost, it also introduces a broad set of unpredictable overheads especially for running data-intensive applications. This paper describes a novel approach which refines workflow structures and optimizes intermediate data transfers for large-scale scientific workflows containing thousands (or even millions) of tasks. The proposed method includes pre- and post-partitioning of workflows and data-flow optimization. Firstly, it partitions a workflow by identifying the critical path of the task graph. Secondly, it controls the granularity of partitions to reduce the complexity of task graph in order to process large-scale workflows. Thirdly, it optimizes the data-flow based on the scheduling to minimize its communication overheads. Our proposed approach is able to handle complex data flows and significantly reduce data transfer by replacing individual tasks according to data dependencies. We conducted experiments using real applications such as Montage and Broadband, and the results demonstrated the effectiveness of our methods in achieving low execution time with low communication overhead in a hybrid computing environments.
引用
收藏
页码:126 / 133
页数:8
相关论文
共 50 条
  • [21] Latency Modeling and Minimization for Large-scale Scientific Workflows in Distributed Network Environments
    Wu, Qishi
    Gu, Yi
    Liao, Yuchen
    Lu, Xukang
    Lin, Yunyue
    Rao, Nageswara S. V.
    [J]. 44TH ANNUAL SIMULATION SYMPOSIUM 2011 (ANSS 2011) - 2011 SPRING SIMULATION MULTICONFERENCE - BK 2 OF 8, 2011, : 205 - 212
  • [22] Scheduling Real-Time IoT Workflows in a Fog Computing Environment Utilizing Cloud Resources with Data-Aware Elasticity
    Stavrinides, Georgios L.
    Karatza, Helen D.
    [J]. 2021 SIXTH INTERNATIONAL CONFERENCE ON FOG AND MOBILE EDGE COMPUTING (FMEC), 2021, : 49 - 56
  • [23] Hybrid computing document similarity in large-scale environment
    Alouane-Ksouri, Sonia
    Sassi Hidri, Minyar
    Barkaoui, Kamel
    [J]. 2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 2159 - 2164
  • [24] BOXSTEP METHOD FOR LARGE-SCALE OPTIMIZATION
    MARSTEN, RE
    HOGAN, WW
    BLANKENSHIP, JW
    [J]. OPERATIONS RESEARCH, 1975, 23 (03) : 389 - 405
  • [25] Energy-efficient mapping of large-scale workflows under deadline constraints in big data computing systems
    Shu, Tong
    Wu, Chase Q.
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 110 : 515 - 530
  • [26] A PARALLEL PARTITIONING METHOD FOR LARGE-SCALE CIRCUIT SIMULATION
    ZHANG, XD
    [J]. UNIVERSITY PROGRAMS IN COMPUTER-AIDED ENGINEERING, DESIGN, AND MANUFACTURING, 1989, : 134 - 141
  • [27] Super spaces: A middleware for large-scale pervasive computing environments
    Al-Muhtadi, J
    Chetan, S
    Ranganathan, A
    Campbell, R
    [J]. SECOND IEEE ANNUAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS, PROCEEDINGS, 2004, : 198 - 202
  • [28] Parallel computing tests on large-scale convex optimization
    Kallio, M
    Salo, S
    [J]. APPLIED PARALLEL COMPUTING: LARGE SCALE SCIENTIFIC AND INDUSTRIAL PROBLEMS, 1998, 1541 : 275 - 280
  • [29] Exploiting Scientific Workflows for Large-scale Gene Expression Data Analysis
    De Stasio, Alessandro
    Ertelt, Marcus
    Kemmner, Wolfgang
    Leser, Ulf
    Ceccarelli, Michele
    [J]. 2009 24TH INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2009, : 447 - +
  • [30] Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments
    Mansouri, Najme
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2014, 8 (03) : 391 - 408