Improving job scheduling performance with parallel access to replicas in Data Grid environment

被引:2
|
作者
Zhang, Junwei [1 ]
Lee, Bu-Sung [1 ]
Tang, Xueyan [1 ]
Yeo, Chai-Kiat [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore
来源
JOURNAL OF SUPERCOMPUTING | 2011年 / 56卷 / 03期
关键词
Data Grid; Data Replication; Parallel download; Job scheduling;
D O I
10.1007/s11227-009-0365-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data Grid has evolved to be the solution for data-intensive applications, such as High Energy Physics (HEP), astrophysics, and computational genomics. These applications usually have large input of data to be analyzed and these input data are widely replicated across Data Grid to improve the performance. The job scheduling performance on traditional computing jobs can be studied using queuing theory. However, with the addition of data transfer, the job scheduling performance is too complex to be modeled. In this research, we study the impact of data transfer on the performance of job scheduling in the Data Grid environment. We have proposed a parallel downloading system that supports replicating data fragments and parallel downloading of replicated data fragments, to improve the job scheduling performance. The performance of the parallel downloading system is compared with non-parallel downloading system, using three scheduling heuristics: Shortest Turnaround Time (STT), Least Relative Load (LRL) and Data Present (DP). Our simulation results show that the proposed parallel download approach greatly improves the Data Grid performance for all three scheduling algorithms, in terms of the geometric mean of job turnaround time. The advantage of parallel downloading system is most evident when the Data Grid has relatively low network bandwidth and relatively high computing power.
引用
收藏
页码:245 / 269
页数:25
相关论文
共 50 条
  • [41] Backfilling with lookahead to optimize the performance of parallel job scheduling
    Shmueli, E
    Feitelson, DG
    [J]. JOB SCHEDULING STRATEGIES FOR PARALLEL PROCESSING, 2003, 2862 : 228 - 251
  • [42] Analysis of job arrival patterns and parallel scheduling performance
    Squillante, MS
    Yao, DD
    Zhang, L
    [J]. PERFORMANCE EVALUATION, 1999, 36-7 : 137 - 163
  • [43] Performance analysis of parallel job scheduling in distributed systems
    Karatza, HD
    Hilzer, RC
    [J]. 36TH ANNUAL SIMULATION SYMPOSIUM, PROCEEDINGS, 2003, : 109 - 116
  • [44] Improving Data Access for Computational Grid Applications
    Ron Oldfield
    David Kotz
    [J]. Cluster Computing, 2006, 9 : 79 - 99
  • [45] Improving data access for computational grid applications
    Oldfield, Ron
    Kotz, David
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2006, 9 (01): : 79 - 99
  • [46] Combining data replication algorithms and job scheduling heuristics in the Data Grid
    Tang, M
    Lee, BS
    Tang, XY
    Yeo, CK
    [J]. EURO-PAR 2005 PARALLEL PROCESSING, PROCEEDINGS, 2005, 3648 : 381 - 390
  • [47] A Hierarchical Approach to Improve Job Scheduling and Data Replication in Data Grid
    Abdi, Somayeh
    Hashemi, Sayyed
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2015, 12 (03) : 278 - 285
  • [48] Improving Multi-Job MapReduce Scheduling in an Opportunistic Environment
    Ji, Yuting
    Tong, Lang
    He, Ting
    Tan, Jian
    Lee, Kang-won
    Zhang, Li
    [J]. 2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013), 2013, : 9 - 16
  • [49] An efficient grid scheduling strategy for data parallel applications
    Kashif Hesham Khan
    Kalim Qureshi
    Mostafa Abd-El-Barr
    [J]. The Journal of Supercomputing, 2014, 68 : 1487 - 1502
  • [50] An efficient grid scheduling strategy for data parallel applications
    Khan, Kashif Hesham
    Qureshi, Kalim
    Abd-El-Barr, Mostafa
    [J]. JOURNAL OF SUPERCOMPUTING, 2014, 68 (03): : 1487 - 1502