Fault tolerance using "Parallel shadow image servers (PSIS)" in grid based computing environment

被引:3
|
作者
Hussain, Naveed [1 ]
Ansari, M. A.
Yasin, M. M.
Rauf, Abdul
Haider, Sajjad
机构
[1] Natl Univ Modern Languages, Dept Informat Technol, Islamabad, Pakistan
[2] Fed Urdu Univ Arts Sci & Technol, Dept Comp Sci, Islamabad, Pakistan
[3] COMSATS Inst Informat Technol, Dept Comp Sci, Islamabad, Pakistan
关键词
grid computing; fault tolerance; PSIS; condor; cactus; job scheduling;
D O I
10.1109/ICET.2006.335982
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper will present a critical review, of the existing fault tolerance mechanism in grid computing and the overhead involved in terms of reprocessing or rescheduling of jobs, if in case a fault arisen For this purpose we suggested the Parallel Shadow Image Server (PSIS) copying techniques in parallel to the Resource Manager for having the check points for rescheduling of jobs from the nearest flag, if in case the fault is detected. The job process is to be scheduled from the resource manager node to the worker nodes and then its' submitted back by the worker nodes in serialized form to the Parallel Shadow Image Servers from the worker nodes after the pre-specified amount of time, which we call the recent spawn or the flag check point for rescheduling or reprocessing of job. If the fault is arisen then the rescheduling will be done from the recent check point and will be submitted to the worker rode from where the job was terminated. This will not only save time but will improve the performance up to major extent.
引用
收藏
页码:703 / 707
页数:5
相关论文
共 50 条
  • [1] Parallel computing in grid environment
    Yilmaz, E
    Ecer, A
    Akay, HU
    Payli, RU
    Chien, S
    Wang, Y
    PARALLEL COMPUTATIONAL FLUID DYNAMICS: ADVANCED NUMERICAL METHODS SOFTWARE AND APPLICATIONS, 2004, : 293 - 300
  • [2] DDGrid: A Grid Computing Environment with Massive Concurrency and Fault-tolerance Support
    Wang, Yongjian
    Luan, Zhongzhi
    Qian, Depei
    Huang, Yuanqiang
    Chen, Ting
    Han, Biao
    Ren, Yinan
    Yu, Kunqian
    Jiang, Hualiang
    GCC 2008: SEVENTH INTERNATIONAL CONFERENCE ON GRID AND COOPERATIVE COMPUTING, PROCEEDINGS, 2008, : 5 - +
  • [3] Fault tolerance in autonomic computing environment
    Tohma, Y
    2002 PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2002, : 3 - 6
  • [4] A fault tolerance service for QoS in grid computing
    Lee, HM
    Chung, KS
    Jin, SH
    Lee, DW
    Lee, WG
    Jung, SY
    Yu, HC
    COMPUTATIONAL SCIENCE - ICCS 2003, PT III, PROCEEDINGS, 2003, 2659 : 286 - 296
  • [5] UNICORE: A grid computing environment for distributed and parallel computing
    Huber, V
    PARALLEL COMPUTING TECHNOLOGIES, 2001, 2127 : 258 - 265
  • [6] Parallel computing using Web servers and "servlets"
    Lo, A
    Bloor, C
    Choi, YK
    INTERNET RESEARCH-ELECTRONIC NETWORKING APPLICATIONS AND POLICY, 2000, 10 (02): : 160 - 169
  • [7] A service-based hierarchical architecture for parallel computing in grid environment
    Tong, WQ
    Ding, JB
    Tang, JQ
    Wang, B
    Cai, LZ
    GRID AND COOPERATIVE COMPUTING, PT 1, 2004, 3032 : 641 - 644
  • [8] Resubmission based fault tolerance approach to Schedule Jobs in GRID environment
    Ahuja R.
    Banga A.
    EAI Endorsed Transactions on Pervasive Health and Technology, 2019, 6 (24)
  • [9] A decentralized fault tolerance model based on level of performance for grid environment
    Mohammed Rebbah
    Yahya Slimani
    Abdelkader Benyettou
    Lionel Brunie
    Cluster Computing, 2016, 19 : 13 - 27
  • [10] A decentralized fault tolerance model based on level of performance for grid environment
    Rebbah, Mohammed
    Slimani, Yahya
    Benyettou, Abdelkader
    Brunie, Lionel
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2016, 19 (01): : 13 - 27